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HYPOELLIPTIC DIFFUSION MAPS I: TANGENT BUNDLES 


TINGRAN GAO 


Abstract. We introduce the concept of Hypoelliptic Diffusion Maps (HDM), 
a framework generalizing Diffusion Maps in the context of manifold learning 
and dimensionality reduction. Standard non-linear dimensionality reduction 
methods (e.g., LLE, ISOMAP, Laplacian Eigenmaps, Diffusion Maps) focus on 
mining massive data sets using weighted affinity graphs; Orientable Diffusion 
Maps and Vector Diffusion Maps enrich these graphs by attaching to each node 
also some local geometry. HDM likewise considers a scenario where each node 
possesses additional structure, which is now itself of interest to investigate. 
Virtually, HDM augments the original data set with attached structures, and 
provides tools for studying and organizing the augmented ensemble. The goal 
is to obtain information on individual structures attached to the nodes and on 
the relationship between structures attached to nearby nodes, so as to study 
the underlying manifold from which the nodes are sampled. In this paper, 
we analyze HDM on tangent bundles, revealing its intimate connection with 
sub-Riemannian geometry and a family of hypoelliptic differential operators. 
In a later paper, we shall consider more general fibre bundles. 
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1. Introduction 


Acquiring complex, massive, and often high-dimensional data sets has become a 
common practice in many fields of natural and social sciences; while inspiring and 
stimulating, these data sets can be challenging to analyze or understand efficiently. 
To gain insight despite the volume and dimension of the data, methods from a 
wide range of science fields have been brought into the picture, rooted in statistical 
inference, machine learning, signal processing, to mention just a few. 

Among the exploding research interests and directions in data science, the rela¬ 
tion between the graph Laplacian [23] and the manifold Laplacian [76] has emerged 
as a useful guiding principle. Specifically, the field of non-linear dimensional¬ 
ity reduction has witnessed the emergence of a variety of Laplacian-based tech¬ 
niques, such as Locally Linear Embedding (LLE) [77] , ISOMAP [93] , Hessian Eigen- 
maps [32], Local Tangent Space Alignment (LTSA) [100], Diffusion Maps [27], 
Orientable Diffusion Maps (ODM) [83], Vector Diffusion Maps (VDM) [81], and 
Schrodinger Eigenmaps [95] . The general practice of these methods is to treat each 
object in the data set (these objects could be images, texts, shapes, etc.) as an 
abstract node or vertex, and form a similarity graph by connecting each pair of 
similar nodes with an edge, weighted by their similarity score. Built with varying 
flexibility, these methods provide valuable tools for organizing complex networks 
and data sets by “learning” the global geometry from the local connectivity of 
weighted graphs. 

The Diffusion Map (DM) framework [27, 56, 25, 26, 28, 83, 81] proposes a prob¬ 
abilistic interpretation for graph-Laplacian-based dimensionality reduction algo¬ 
rithms. Under the assumption that the discrete graph is appropriately sampled 
from a smooth manifold, it assigns transition probabilities from a vertex to each of 
its neighbors (vertices connected to it) according to the edge weights, thus defining 
a graph random walk the continuous limit of which is a diffusion process [97, 35] 
over the underlying manifold. The eigenvalues and eigenvectors of the graph Lapla¬ 
cian, which converge to those of the manifold Laplacian under appropriate assump¬ 
tions [7, 8], then reveal intrinsic information about the smooth manifold. More 
precisely, [9] proves that these eigenvectors embed the manifold into an infinite di¬ 
mensional P space, in such a way that the diffusion distance [27] (rather than the 
geodesic distance) is preserved. Appropriate truncation of these sequences leads to 
an embedding of the smooth manifold into a finite dimensional Euclidean space, 
with small metric distortion. 

Under the manifold assumption, [83, 81] recently observed that estimating ran¬ 
dom walks and diffusion processes on structures associated with the original mani¬ 
fold (as opposed to estimates of diffusion on the manifold itself) are able to handle 
a wider range of tasks, or obtain improved precision or robustness for tasks con¬ 
sidered earlier. For instance, [83] constructed a random walk on the orientation 
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bundle [17, §1.7] associated with the manifold, and translated the detection of ori- 
entability into an eigenvector problem, the solution of which reveals the existence 
of a global section on the orientation bundle; [81] introduced a random walk on the 
tangent bundle associated with the manifold, and proposed an algorithm that em¬ 
beds the manifold into an P space using eigen-vector-fields instead of eigenvectors 
(and thus the name Vector Diffusion Maps (VDM)). In [98] the VDM approach is 
used, analogously to [9], to embed the manifold into a finite dimensional Euclidean 
space. Although the VDM embedding does not reduce the dimensionality as much 
as standard diffusion embedding methods, it benefits from improved robustness to 
noise, as illustrated by the analysis of some notoriously noisy data sets [49, 50]. 

Both [83] and [81] incorporate additional structures into the graph Laplacian 
framework: in [81] this is an extra orthogonal transformation (estimated from local 
tangent planes) attached to each weighted edge in the graph; in [83] the edge 
weights are overwritten with signs determined by this orthogonal transformation. 
These methods are successful because they incorporate more local geometry in 
the path to dimensionality reduction, by estimating tangent planes. In fact, the 
advantage of utilizing local geometric information from the tangent bundle had been 
noticed earlier: Fig. 1 shows a simple example, borrowed from [56, §2.6.1], where 
the original data set (shown in Fig. 1(a)) is a Descartes Folium with self-intersection 
at the origin, parametrized by 


x{e) = 


3tan0 
1 -I- tan^ 9 ’ 


y{^) 


3 tan^ 9 
1 -I- tan^ 9 ’ 


9 G 


■ TT TT' 
.~ 2 ’ 2 . ■ 


This curve is the projection onto a plane of a helix in A standard isotropic 
random walker on the planar curve would get lost at the intersection, even when 
sober, as shown in Fig. 1(b), where the embedding completely mixes blue and red 
tails beyond the crossing point. In contrast, incorporating tangent information 
into local similarity scores yields a much more clear embedding back to (see 
Fig. 1(c)), which blows up (in the sense of complex algebraic geometry [42, pp.l82]) 
the self-intersecting curve at its singularity and unraveled its hidden geometry. 
Specifically, the similarity measure used in the modified diffusion map between any 
pair of points {x {9i) ,y (0i)) and {x (^ 2 ), y {^ 2 )) on the curve is 


d{{x{9i) ,y{9i)) ,{x {92),y{92))f = ||(a:: {9i),y{9i)) - (x {92) ,y {92))\\l 


(^' {9i),y' {9i)) 

{x' {92),y' {92)) 

\\{x'{9,),y'{9M2 

\\{x'{92),y'{92))\\2 


where /r > 0 is a parameter that balances the two contributions to the dissimilarity 
score in consideration. (Two distinct tangent vectors exist at the self-intersection, 
but they each belong to a distinct point in the parametrization.) It is possible to 
use the methodology of ODM and VDM to tackle similar problems in much broader 
contexts, where the local geometric information can be of a different type than in¬ 
formation about tangent planes. Indeed, for many data sets, a single data point 
has abundant structural details; typically graph-Laplacian-based methods begin by 
“abstracting away” these details, encoding only pairwise similarites. In some cir¬ 
cumstances, the hidden details (pixels in an image, vertices/faces on a triangular 
mesh, key words and transition sentences in a text, etc.) may themselves be of 
interest. For example, in the geometry processing problem of analyzing large col¬ 
lections of 3D shapes, it is desirable to enable user exploration of shape variations 
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Figure 1. A Diffusion Map Incorporating Local Geometric Information 


across the collection. In this case, abstracting each single shape as a graph node 
completely ignores the spatial configuration of an individual shape. On the other 
hand, even when sticking to pairwise similarity scores significantly simplifies the 
data manipulation, the best way to score similarity is not always clear. In prac¬ 
tice, the similarity measure is often dictated by practical heuristics, which may be 
misguided for incompletely understood data. In addition, there are situations for 
which it can be proved that no finite-dimensional representation will do justice to 
the data. (For instance, in topological data analysis of shapes and surfaces, the only 
known sufficient statistics (other than the data set itself) is the set of all persistent 
diagrams taken from all directions [94].) 

In this paper, we propose the Hypoelliptic Dijfusion Map (HDM), a new graph- 
Laplacian-based framework for analyzing complex data sets. This method focuses 
on data sets in which pairwise similarity between data points is not sufficiently 
informative, but each single data point carries sophisticated individual structure. 
In practice, this type of data set often arises when the data acquired is too noisy, 
has huge degrees of freedom, or contains un-ordered features (as opposed to se¬ 
quential data). An example that has all these characteristics is, e.g., a data set 
in which each data point is a two-dimensional surface in represented either by 
a triangular mesh or a collection of persistent diagrams. In many cases, comput¬ 
ing pairwise similarity within such data sets requires minimizing some functional 
over the space of admissible pairwise correspondences, and the similarity score be¬ 
tween two surfaces is achieved by a certain optimal correspondence map between 
the surfaces. It is conceivable that the optimal correspondence contains substantial 
information, missing from the condensed similarity score. The HDM framework is 
our first attempt at mining this hidden information from correspondences. 

Like ODM and VDM, HDM generalizes the DM framework, but it takes an 
essentially different path. We are most interested in the scenario in which the 
individual structures themselves are also manifolds. In order to take them into 
consideration, we first augment the manifold underlying DM, denoted as M, with 
extra dimensions. To each point x on M, this augmentation attaches the individual 
manifold at a;, denoted as F),; we assure that around each x € M there exists 
an open neighborhood U such that on U the augmented structure “looks like” a 
product of U with a “universal template” manifold F. Intuitively, M plays the role 
of a “parametrization” for all the F^.. Of course, the existence of such a universal 
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template makes sense only if the Fx,x € M are compatible in some appropriate 
sense (each should at least be diffeomophic to F; we shall add more restrictions 
below); however, such compatibility is not uncommon for many data sets of interest, 
as we shall see in Section 2. This picture of parametrizing a family of manifolds 
with an underlying manifold is reminiscent of the modern differential geometric 
concept of a fibre bundle, which played an important role in the development of 
geometry, topology, and mathematical physics in the past century. Therefore, we 
shall refer to this geometric object as the underlying fibre bundle of the data set. 
Adopting the terminology from differential geometry, we call M the base manifold, 
the universal template manifold F the fibre, and each F^ a fibre at x. 

The probabilistic interpretation of HDM is a random walk on the fibre bundle. 
In one step, the transition occurs either between points on adjacent but distinct 
fibres, or within the same fibre. Since the fibre bundle is itself a manifold (referred 
to as the total manifold, denoted as E), this looks so far no different from a di¬ 
rect application of DM, only on an augmented geometric object. However, HDM 
also incorporates the pairwise correspondences of data points in the fibre bundle 
formulation, by requiring transitions between distinct fibres to satisfy certain di¬ 
rectional constraints imposed by the correspondences. The resulting random walk 
is no longer a direct analogy of its standard counterpart on the total manifold, but 
rather a “lift” of a random walk on the base manifold M. Under mild assumptions, 
its continuous limit is a diffusion process on the total manifold E, infinitesimally 
generated by a hypoelliptic differential operator [46] (thus the name HDM). We can 
then embed the whole fibre bundle into a Euclidean space using the eigenvectors of 
this hypoelliptic differential operator; discretely this corresponds to solving for the 
eigenvectors of our new graph Laplacian, referred to as a hypoelliptic Laplacian of 
the graph. It turns out that, by varying a couple of parameters in its construction, 
the family of graph hypoelliptic Laplacians contains the discrete analogue of sev¬ 
eral important and informative partial differential operators on the fibre bundle, 
relating the geometry of the base manifold with that of the total manifold. Our 
numerical experiments revealed interesting phenomena when embedding the fibre 
bundle using eigenvectors of these new graph Laplacians. 

Though the HDM framework applies to general fibre bundles, the focus of this 
paper is the study of tangent and unit tangent bundles of Riemannian manifolds; 
in a sequel paper we shall study more general fibre bundles. Note that even though 
the fibre bundles in this paper are the same as for VDM, HDM for tangent bundle 
nevertheless differs from VDM; we shall come back to this below. 

This paper is organized as follows: Section 2 sets up notations and terminology, 
and discusses the meaning of the fibre bundle assumption'. Section 3 describes the 
formulation of HDM in detail; Section 4 characterizes the hypoelliptic graph Lapla¬ 
cians on tangent and unit tangent bundles, and studies their pointwise convergence 
from finite samples; some numerical experiments are shown in Section 5; finally we 
conclude with a brief discussion and propose potentially interesting directions for 
future work. In Appendix A we include preliminaries on the geometry of tangent 
bundles and (as their subbundles) unit tangent bundles. 
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2. Motivating The Fibre Bundle Assumption 


For high-dimensional data generated by some implicit process with relatively 
fewer degrees of freedom, it is often reasonable to assume that the data lie ap¬ 
proximately on a manifold of much lower dimension than the ambient space. In 
the literature on semi-supervised learning, this is often referred to as the manifold 
assumption [6, 101]. The goal of semi-supervised learning is to build a classifier 
based on a partially labeled training set; learning the underlying manifold structure 
of high-dimensional data is often the first step in this practice, not only because it 
reduces the dimensionality, but also due because it simplifies the data and exposes 
the structure. 

Our fibre bundle assumption is a generalization of the manifold assumption. In 
differential geometry, a fibre bundle is a manifold itself, that is structured as a family 
of related manifolds parametrized by another underlying manifold. Following [88], 
a fibre bundle consists of the following data^: 

(1) the total manifold E] 

(2) the base manifold M\ 

(3) the bundle projection tt, a surjective smooth map from E onto M; 

(4) the fibre manifold E, satisfying 

(a) for any x G M, (x) is diffeomorphic to F; 

(b) for any x G M, there exists an open neighborhood 17 of x in M and a 
diffeomorphism fu from 7r“^ ([/) to 17 x F; 

(5) the structure group G, a topological transformation group that acts effec¬ 
tively^ on F, satisfying 

(a) for any x G M and two open neighborhoods U and V that both satisfy 
(4b), the diffeomorphism on F, defined as “freezing the first component 
as a:”, obtained from fy ° f'u 

9uv ■= [(t^v o (x, ■) : F -)■ F, 
is an element gfjy in G, and this correspondence 
X 9uv 

is continuous with respect to the topology on G; 

(b) for any x G M and three open neighborhoods 17, V, W that all satisfy 
(4b), 

9uu — the identity element e of G 

and 


9uv ° 9vw — 9uw- 

The diffeomorphisms in (4b) are also known as local trivializations. For each x on 
the base manifold M, it is conventional to denote the fibre over x as F^ := 7r“^ (x). 
The fibre bundle assumption can now be stated as follows: 


^Strictly speaking, the definition given here is that of a coordinate bundle [88, §2.3]; fibre 
bundles are equivalence classes of coordinate bundles. This distinction is less crucial since in the 
HDM framework we describe the structure of a fibre bundle using coordinates. This is similar to 
how manifold learning uses the notion of a manifold. 

acts effectively on F if g (/) = f for all / S F implies g = e, the identity element of G 
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Assumption 2.1 (The Fibre Bundle Assumption). The data lie approximately 
on a hbre bundle, in the sense that each data object is a subset of a fibre over 
some point on a base manifold. 


Note that in the special case where the fibre manifold F is a single point, the 
fibre bundle is diffeomorphic to its base manifold, and our fibre bundle assumption 
reduces to the manifold assumption. 

The definition of a fibre bundle is technical, especially for the part involving the 
structure group G. The key point is that a hbre bundle is locally a product manifold, 
and these local pieces are carefully patched together so that the product structures 
remain consistent when they intersect. Product manifolds are thus hbre bundles 
by dehnition, but the concept of a hbre bundle becomes interesting only when 
the global geometry gets twisted and exposes non-trivial topology. The Mobius 
band, the Klein bottle, and the Hopf hbration are standard illustrations of this; see 
e.g. [88, §1]. 

At a hrst glance, the hbre bundle assumption imposes strong restrictions on the 
data set structure. However, when understanding the structure of individual data 
points is equally as interesting as understanding the structure of the data set in the 
large, the framework based on the manifold assumption becomes insufficient. For 
instance, in geometric morphormetrics [99], the data sets of interest are collections 
of shapes, i.e., two-dimensional smooth surfaces in and the central problem is 
to infer species and other biological information from shape variations. Under the 
assumption that these variations are governed by relatively few degrees of freedom, 
it is possible to learn manifold coordinates for each shape in the collection (e.g., 
applying the diffusion map to the shape collection based on some pairwise shape- 
distance, e.g., [74, 60, 38, 62, 57, 58, 1]). Yet it is difficult to infer shape variation 
from such coordinates, since the geometry of each individual shape is “abstracted 
away”, collapsing each shape to a single point. To add interpretability to the man¬ 
ifold learning framework in this circumstance, it is a natural idea to learn different 
coordinates for distinct points on the same shape, and simultaneously keep similar 
the coordinates of points belonging to different shapes that are development ally or 
functionally equivalent. This geometric intuition is embodied by the fibre bundle 
assumption. From this point of view, the fibre bundle assumption is but an extra 
level of indirection (borrowing a term from Andrew Koenig’s “fundamental theorem 
of software engineering”) for the manifold assumption. 

The example of shape analysis in geometric morphometics is particularly inter¬ 
esting, because it contains another source of ideas that naturally models the data 
set as a hbre bundle: the global registration problem. Geometric morphometri- 
cians typically select equal numbers of homologous landmarks on each shape in a 
globally consistent manner, then reduce the analysis to investigation of the shape 
space [51, 52] of these landmark points. Along these lines, tools like the Generalized 
Procrustes Analysis (GPA) [40, 34, 53, 41] have been developed in statistical shape 
analysis [33], and software products [69, 71] made available. (Recent progress in this 
area [67, 86, 66, 3, 20] relates semidefinite programming with the little Grothendieek 
problem.) A common basis for these GPA-based methods is that the homology of 
landmark points depends on human input. Manually placing landmarks on each 
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shape among a large collection is a tedious task, and the skill to perform it correctly 
typically requires years of professional training. Recently, automated methods have 
been proposed in this field, based on efficient and robust pairwise surface compar¬ 
ison algorithms [18, 57, 58, 1, 72, 73]. However, biological morphologists typically 
do not compare surfaces merely pairwise: in practice, an experienced morphologist 
uses a large database of anatomical structures to improve the consistency and ac¬ 
curacy of visual interpretation of biological features. This consistency can not be 
trivially achieved by any geometric algorithm that uses only pairwise comparison in¬ 
formation, even when each pairwise comparison is of remarkably high quality. This 
is shown in Fig. 2, where a small set of landmarks is propagated from a Microcebus 
molar to a corresponding Lepilemur molar, along three different paths. Though all 
surfaces A through E are fairly similar to each other (and hence the algorithm in [1] 
guarantees high quality pairwise correspondences), direct propagation of landmarks 
via path A-^B gives a different result from A-^C^B or A-^D^E^B. Using the 
collection {A, B, C, D, E} leads to a more accurate correspondence between A and 
B then an isolated A-B comparison would. 



Figure 2. Non-Triviality in Analyzing a Collection of Teeth (c.f. [18]) 

In the fibre bundle framework, the inherent inconsistency for pairwise-comparison- 
based global registration can be modeled using the concept of the holonomy of 
connections. In the sense of Ehresmann [36], a connection is a choice of splitting 
the short exact sequence 

( 2 . 1 ) O^VE^TE^tt*TM ^0 

In this short exact sequence, TE is the tangent bundle of the total manifold E] 
VE is the vertical tangent bundle of E, a subbundle oiTE spanned by vectors that 
are tangent not only to E at some point u € E, but also to the fibre F)r(u) over 
TT (m) € M; n*TM is the pullback bundle of TM to TE. The practical meaning of 
this definition is as follows: since the fibre Ej. over x G M carries manifold structure 
for itself, the notion of vectors that are “tangent to the fibre” is well-defined; they 
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correspond to VE. The short exact sequence (2.1) tells us that the quotient bundle 
of Ti? by Fi? is isomorphic to Tr*TM, but there is no canonical way to choose a 
“horizontal tangent bundle” HE for TE such that 


HE®VE = TE. 


The definition of an Ehresmann connection is just the choice of such a subbundle 
HE. More concretely, a connection specifies for each point u G E a subspace H^E 
of TuE, such that HuE together with all vertical tangent vectors at u spans the 
entire tangent space T„E at u. Of course, the choice of subspaces HuE should 
depend smoothly on u. We shall call vectors in HuE horizontal, while keeping in 
mind that this concept builds upon the connection. 

As long as a connection is given on a fibre bundle, tangent vectors on the base 
manifold M can always be canonically lifted to E. That is, for any u G E and 
any tangent vector Xt^^^u) G TiriujM, there exists in HuE a unique tangent vector 
G TuE. In fact, this follows immediately from the fact that HE is isomorphic 
to tt*TM, as implied in the short exact sequence (2.1). Moreover, a smooth vector 
field X on M can be uniquely lifted to E, resulting in a vector field X-^ on E 
that is horizontal everywhere. This eventually enables us to lift any smooth curve 
7 : K —M on the base manifold to a horizontal curve 7 on E, defined by the ODE 



Note that the horizontal curve is uniquely determined once its starting point on E 
is specified. Therefore, given a smooth curve 7 : [0,1] —>■ M that connects 7 (0) to 
7 (1) on M, there exists a smooth map from iby(o) to ^. 7 ( 1 ) (at least when 7 (0) and 
7 ( 1 ) are sufficiently close), defined as 


-^7(0) 3 s I-A 7 s ( 1 ) € .^7(1) 


where 7 s denotes the horizontal lift of 7 with starting point s. Such constructed 
maps between neighboring fibres, obviously depending on the choice of path 7 , is 
called the parallel transport along 7 . Like the concept of horizontal tangent vectors, 
parallel transport depends on the choice of the connection. We shall denote the 
parallel transport from fibre Ey to fibre E), as Pfy ■. Fy ^ E^. When 7 is a unique 
geodesic on M that connects y to x, we drop the super-index 7 and simply write 
Pxy '■ Fy ^ Ex. We shall see later that the probabilistic interpretation of HDM 
(and even VDM) implicitly depends on lifting from the base manifold a path that 
is continuous but not necessarily smooth. Though this can not be trivially achieved 
by the ODE based approach, stochastic differential geometry has already prepared 
the appropriate tools for tackling this technicality (see e.g. [90, §5.1.2]). 

We now return to modeling the inherent inconsistency for geometric morphomet¬ 
ries. Similar to the diffusion map framework, where small distances are considered 
to approximate geodesic distances on the manifold, we assume, when the pairwise 
distance between surfaces Si , S 2 is relatively small among all pairwise distances 
within the collection, that the shape distance is approximately equal to the geo¬ 
desic distance on the base manifold. Moreover, under the fibre bundle assumption, 
we consider the pairwise correspondence map from Si to S 2 to approximate Ps 2 ,Sn 
the parallel transport along the geodesic connection to S 2 . By routing through 
different intermediates, one obtains different maps from Si to S 2 , which is concep¬ 
tually equivalent to parallel-transporting along different piecewise geodesics. Due 
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to the dependency on the underlying path, the parallel transport typically does 
not define globally consistent maps. The inconsistency shown in Fig. 2, caused by 
propagation along three different paths, fits into this geometric picture. 

The inconsistency of parallel transport, closely related to the curvature of the 
corresponding connection [2], is characterized by the notion of holonomy [91, 19, 11]. 
If for all x,y £ M the parallel transport : Fy ^ is independent of the choice 
of path 7 , then the connection is said to be flat or has trivial holonomy; otherwise 
the connection is non-flat or the holonomy is non-trivial. Fig. 3 illustrates the non¬ 
trivial holonomy of the Levi-Civita connection on the unit sphere in if we 
parallel transport a tangent vector v G TaS'^, first from AtoC along the equator 
and then from C to B along the meridian, then the result PbcPcav is generally 
different from Pbav, the result obtained by directly parallel transporting v from A 
to B along the meridian that connects the two points. 



Figure 3. Holonomy on a Unit Sphere 

The fibre bundle of interest in Fig. 3 is an example of a tangent bundle. Generally, 
the tangent bundle TM of a d-dimensional Riemannian manifold M is a fibre bundle 
with base manifold M, fibre and structure group O (d) (the d-dimensional 
orthogonal group); the fibre over each x G M is T^M, the tangent space of M 
at X. On this bundle, there uniquely exists a canonical connection, the Levi-Civita 
connection^ that is simultaneously torsion-free and compatible with the Riemannian 
metric on M. The unit tangent bundle UTM is a subbundle of TM, with the same 
base manifold and structure group, but has a different type of fibre S‘^, the unit 
{d — l)-dimensional sphere in M'^; the fibre over each x G M consists of all tangent 
vectors of M at a; with unit length. The Levi-Civita connection carries over to a 
canonical connection on UTM. We focus on analyzing HDM on these two types of 
fibre bundles in this paper. 

Note that the tangent bundle is also of fundamental importance for VDM. How¬ 
ever, as we shall see in Section 3, HDM aims at a goal different from VDM’s, even 
on tangent bundles: VDM acts on vector fields on M, or equivalently operates on 
sections of TM (denoted as F {M,TM)); HDM focuses on functions on TM, and 
thus operates on sections of the trivial line bundle TM xM. (denoted as F {TM, M.)). 
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While VDM embeds the base manifold M into a Euclidean space of lower dimension, 
HDM is more interested in how each fibre of TM corresponds to its neighboring 
fibres. In short, VDM and HDM extend DM in two different directions. 

The use of diffusion maps to solve the global registration problem was proposed 
earlier in the geometry processing community [80, 54], as was the concept of a 
“template” for a collection of shapes [96, 87, 68, 48, 47, 22]. These approaches 
were quite successful, albeit based mostly on heuristics; the fibre bundle framework 
provides geometric interpretations and insights for many of them. For instance, 
cycle-consistency-based approaches [68, 47] focus on improving the consistency of 
composed correspondence maps along 1,2,3-cycles, which is implicitly an attempt 
to recover from condition (5b) the fibre bundle structure that underlies the shape 
collection; from this point of view, these method sample only one point from each 
coordinate patch on the base manifold, and likely suffer from an inaccurate recovery 
due to low sampling rate. [54] uses the diffusion map as a visualization tool, based 
on a dissimilarity score computed from local and global shape alignments. This 
is similar to the random walk HDM constructs on a fibre bundle; the geometric 
meaning of the fuzzy correspondence score in [54] is vague from a manifold learning 
point of view, but then, it was not the main focus of [54] to analyze the new graph 
Laplacian on the discretized fibre bundle. 

From the fibre bundle point of view, the goal of many global registration problems 
is to learn the fibre bundle structure that underlies the collection of objects. Making 
an analogy with the terminology manifold learning^ we call this type of learning 
problems fibre learning. A flat connection, or its induced parallel transport, is 
the key to resolving the problem. However, we remark that the existence of a 
flat connection on an arbitrary fibre bundle is not guaranteed: the geometry and 
topology of the fibre bundle may be an obstruction. For a discussion on tangent 
bundles, see [61, 39]. 

3. Hypoelliptic Diffusion Maps: The Formulation 

3.1. Basic Setup. The data set considered in the HDM framework is a triplet 
,p,G), where 

(1) The total data set is formed by the union 

n 

^=[jx, 

where each subset Xj is referred to as the j-th fibre of , containing Kj 

points 

^3 ~ ^i,2) ■ ■ ■ J ^3,K.j } ■ 

We call the collection of fibres the base data set 

= ,X„}, 

and let TT : ^ he the canonical projection from to 

TT : jr — 

Xj,k '—> Xj, l<j<n,l<k<Kj. 

We shall denote the total number of points in as 


K = Ki + K2 + ■ ■ ■ + Kn 
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(2) The similarity measure p is a real-valued function on ^ x ^, such that 
for all ^, 776 ^ 

p(C ,0 = o, pitv) = p{v.O- 

On the product set Xi x Xj, we denote 

Pij ('^7 P ’ 

then pij is an ki x Kj matrix on M, to which we will refer as the similarity 
matrix between Xi and Xj. 

(3) The affinity graph G = (V, E) has K vertices, with each Vi^s corresponding 
to a point xt^s S ; without loss of generality, we shall assume G is 
connected. (In our applications, each Xi^s is typically connected to several 
Xj/s on neighboring fibres.) If there is an edge between vt^s and Vj^t in 
G, then Xi^s is a neighbor of Xj^t and Xj^t is a neighbor of Xi^s- Moreover, 
we also call Xi a neighbor of Xj (and similarly Xj a neighbor of Xi) if 
there is an edge in G linking one point in Xi with one point in Xj] this 
terminology implicitly defines a graph Gb = (VbtEb), where vertices of 
Vb are in one-to-one correspondences with fibres of , and Eb encodes 
the neighborhood relations between pairs of fibres. Gb will be called as the 
base affinity graph. 

3.2. Graph Hypoelliptic Laplacians. Let W G be the weighted adjacency 
matrix of the graph G, i.e., IT is a block matrix in which the (z, j)-th block 

(3.1) IT.,=p„. 

The {s,t) entry in Wij is thus the edge weight pij (s,t) between Vi^s and Vj^t- Note 
that IT is a symmetric matrix, since p is symmetric. Let D be the k x k diagonal 
matrix 

{ n i^j n I 

EE >EE W'„.,(K„,t) I, 

3 = 1 t=l 3 = 1 t=l j 

then the graph hypoelliptic Laplacian for the triplet {jjE,p,G) is defined as the 
graph Laplacian of the graph G with edge weights given by IT, that is 

(3.3) :=D- IT. 

Since G is connected, the diagonal elements of D are all non-zero, and we can define 
the random-walk and normalized version of 

(3.4) := D-^L^ = / - T^-^IT, 


(3.5) Lf := =1- D-i/^ITI?-^/^. 

Following [27] , we can also repeat the constructions above on a renormalized graph 
of G. More precisely, let Qa be the K x K diagonal matrix 

( ( N K, 

(3.6) Qa :=diag j f (l,t) 

where a is some constant between 0 and 1, and set 

(3.7) ITa := Q-'lTQ-i. 


N Kj 


WN,j {KN,t) 


.3=1 t=i 
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The graph hypoelliptic Laplacians can then be constructed for Wa instead of W, 
by first forming the K x K diagonal matrix 



(3.8) 


and then set 



(3.9) 



(3.10) 



(3.11) 


L- 


3.3. Spectral Distances and Embeddings. In order to define spectral distances, 
we shall use eigen-decompositions. This is the reason to consider the symmetric 
matrices their eigen-decompositions lead to a natural representation for 
since and L^,.^ are diagonal-similar: 



of V G can be written as the concatenation of n segments of length 

‘ ' j j 



where V[j] G defines a function on fibre Xj. Now let Aq < Ai < A 2 < • • • < 

Ak_i be the k eigenvalues of in ascending order, and denote the eigenvector 
corresponding to eigenvalue Aj as Vj. By our connectivity assumption for G, we 
know from spectral graph theory [23] that Aq = 0, Aq < Ai, and vq is a constant 
vector with all entries equal to 1; we have thus 

0 = Aq < Ai < A 2 < • • • < Ak-1. 

By the spectral decomposition of 


K— 1 


(3.12) 



1=0 


and for any fixed diffusion time t G K^, 


(3.13) 



1=0 


with the (f, j)-block taking the form 



(3.14) 


14 


TINGRAN GAO 


In general, this block is not square. Its Frobenius norm can be computed as 


(3.15) 



2 

= Tr 


\ / ZJ 

F 

\ ^ IJ ^ ^ IJ 



K— 1 


= Tr 




/,m—0 

K— 1 


= Tr 



l,m—0 


l,m—0 

Let us define the hypoelliptic base diffusion map 
-.SS —^ K''" 

(3-16) ^ 

Aj I 




(3,17) 




= (F‘ (Xff) 


' 0</,m<K—1 

then (denoting the standard Euclidean inner produce in as (•,•)) 

' ’■j 

with which we can define the hypoelliptic base diffusion distance on as 
(3.18) 

dHBDM.t = II (Xi) - (X,)|| 

= {{V^ {X,), F* (X,)) + (F‘ (X,), F* (X,)) - 2 (F‘ (X,), F‘ (X,))}" • 

The hypoelliptic base diffusion map embeds the base data set £§ into a Euclidean 
space using Gb, the base affinity graph with edges weighted by entry-wise non¬ 
negative matrices. In this sense, it is closely related to the vector diffusion maps [81]: 
if 

Ki = K2 = ■ ■ ■ = Kn = d 

and (relaxing the constraint p > 0) 


(vi,Vj) G Eb Pij = WijOij, wherewij > 0 and Oij is d x d orthogonal, 

then the weighted adjacency matrix W (as defined in (3.1)) coincides with the 
adjacency matrix S defined in [81, §3]. In this case, the graph hypoelliptic Laplacian 
of (^,p, G) reduces to the graph connection Laplacian for {Gb, {wij} , {Oij}). 
Note that in HDM we assume the non-negativity of the similarity measure p, which 
is generally not the case for vector diffusion maps. (The non-negativity of the 
eigenvalues of allows us to consider arbitrary powers in VDM, this is 

circumvented by considering powers of S^.) From a different point of view, by 
the Riesz Representation Theorem, smooth vector fields on a manifold M can be 
identified with linear functions on TM, thus VDM can be viewed as HDM restricted 
on the space of linear functions on TM. 

In addition to embedding the base data set HDM is also capable of embedding 
the total data set 3T into Euclidean spaces. Define for each diffusion time t G R"*" 
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the hypoelliptic dijfusion map 

(3-19) 

Xj,s '—^ (Aiiii[j] (s), X2V2[j] (s) , • • • , K-iVin-i)^] (s)) ■ 

where vi[j-^ (s) is the s-th entry of the j-th segment of the ^-th eigenvector, with 
j = 1, • • • , n, s = 1, • • • ,Kj. We could also have written 


Vi[j] (s) = vi {sj + s ), where Si = 0 and Sj = Kp for j > 2. 

p=i 

Following a similar argument as in [27], we can define the hypoelliptic diffusion 
distance on as 


(3.20) djjDM.t {xi^sT Xj i) — ||F7 (^i,s) dH . 

As a result, H* embeds the total data set ^ into a Euclidean space in such a 
manner that the hypoelliptic diffusion distance on ^ is preserved. Moreover, this 
embedding automatically suggests a global registration for all fibres, according to 
the similarity measure p. For simplicity of notations, let us write 

\x, 

for the restriction of H* to fibre Xj, and call it the j-th component of H^. Up to 
scaling, the components of H* bring the fibres of to a common “template”, such 
that points Xi^s and Xj^t with a high similarity measure pij {s,t) tend to be close 
to each other in the embedded Euclidean space. Pairwise correspondences between 
fibres Xi, Xj can then be reconstructed from the hypoelliptic diffusion map. Indeed, 
assuming each Xj is sampled from some manifold Fj , and a template fibre F C M"* 
can be estimated from 

then one can often extend (by interpolation) i7j from a discrete correspondence to 
a continuous bijective map from Fj to F, and build correspondence maps between 
an arbitrary pair Xi,Xj by composing (the interpolated continuous maps) and 
(i7j) A similar construction was implicit in [54]. Sometimes, it is more useful 
to consider a normalized version of hypoelliptic diffusion map that takes value on 
the standard unit sphere in 

: 3F ^ c K'" 


(3.21) 




i.Xj,s) 
Hj {Xj,s) 


We shall see an example that applies to SO (3) in Section 5. 


4 . HDM ON Tangent and Unit Tangent Bundles 

The HDM framework is very flexible: if each fibre consists of one single point, 
the hypoelliptic graph Laplacian reduces to the graph Laplacian that underlies 
diffusion maps; if all the fibres have the same number of points and all similarity 
matrices (defined in Section 3.3) are orthogonal (up to a multiplicative constant), 
the hypoelliptic graph Laplacian reduces to the graph connection Laplacian that 
underlies vector diffusion maps. The goal of this section is to relate HDM to some 
other partial differential operators of geometric importance on tangent and unit 
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tangent bundles of (compact, closed) Riemannian manifolds. In a follow-up paper, 
we will extend the geometric setting to more general fibre bundles. 

This section builds upon the hbre bundle assumption. Adopting notation in 
Section 3.1, we assume that is sampled from a fibre bundle A, and each fibre 
Xj is sampled from a hbre over some point on the base manifold M. Moreover, 
we shall assume that E is the tangent bundle or unit tangent bundle of M, i.e., 
E = TM or E = UTM. For the convenience of the reader, some basic properties 
about the geometry of these hbre bundles are reviewed in Appendix A. 

4.1. HDM on Tangent Bundles. Let AT : —>■ be a smooth kernel function 

supported on the unit square [0,1] x [0,1]. In all that follows, we shall assume that 
M is a compact manifold without boundary, which, according to standard custom, 
we shall simply call a closed manifold. Let the closed manifold M be equipped with 
a Riemannian metric tensor g, which induces on M a geodesic distance function 
djvf ^ ^ M —>■ IR-°; g dehnes an inner product on each tangent space of M, 

denoted as 

(4.1) {u,v)^ = gjkix)u^v^, u.veT^M, 

where, and for the remainder of this section unless otherwise specihed, we have 
adopted the Einstein summation convention. The vector norm on T^M with respect 
to this inner product shall be denoted as 

(4.2) \\u\\^ = {gjkix)u^u'‘)^ , uGT^M. 

We denote 


: T^M 


TyM 


(4.3) 

for the parallel transport from x € M to y G M with respect to the Levi-Civita 
connection on M, along a geodesic segment that connects x to y. It is well known 
that a tangent vector can be parallel-transported along any smooth curve on M\ 
since M is compact, its injectivity radius Inj (M) is positive, and thus any x G M 
lies within a geodesic normal neighborhood in which any point y can be connected 
to X through a unique geodesic with length smaller than Inj (M). Therefore, Py^x 
is well-defined, at least for x,y G M with dM {x,y) < Inj (M). Furthermore, for 
such x,y G M, Py^x is an orientation-preserving isometry between the domain and 
target tangent planes [30, Exercise 2.1]. 

For bandwidth parameters e > 0, (5 > 0, define for all {x,v ), {y,w) G TM 


(4.4) 


ATe.a {x,v;y,w) := K 


d\i {x, y) 


\Py,xV-w\\y 


Note that the requirement that supp (AT) C [0,1] x [0,1] implies that s {x, v; y, w) yf 
0 only if dM (x, y) < \/e. It follows that Py,x^ and further {x^ v; y, w), are well- 
defined when e < Inj (M)^; we shall restrict ourselves to such sufhciently small e. 
K^ s is symmetric because Py^x is an isometry between TxM and TyM: 


dlliv^x) \\Px,yW-v\\l\ J.(dlj{x,y) \\Py,x{Px,yW - v) 


K,,s{y.w-x,v)=K\ ^ ^ ' 


dh ix,y) \\w- Py,xv\\l 


= {x,v;y,w ). 
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This symmetry is of particular importance for the definition of symmetric diffusion 
semigroups [89]. 

Let p S (TM) be the probability density function according to which we shall 
sample. Assume p is bounded from both above and below (away from 0): 

(4.5) 0 < Pm < pix,v) < Pm < oo, \/{x,v)&TM. 

Define 

(4.6) / K^^s{x,v;y,w)p{y,w) d^{y,w). 

JTM 

where dp is the standard volume form on TM . As in Appendix A, dp is a product of 
dVy (w) (the standard translation- and rotation-invariant Borel measure on TyM) 
and dvolM (y) (the standard Riemannian volume element on M). If we fix a; € M 
and integrate p (x, v) along T^M, then 


(4.7) p{x):= 

is a density function on M, since 


p (x, v) dVx (v) 


IT^M 


/ p{x)dvo\{x)= / / p {x, v) dVx {v) dvo\ (x) = / p {x,v) dp {x,v) = 1. 

Jm JmJt^M JTM 


We call p the projection of p on M. Furthermore, dividing p {x, v) by p (x) yields a 
conditional probability density function on T^M 


(4.8) 


p{v\x) 


p{x,v) 

p{x) 


since 


p{v \x) dVx (v) = 




p{x) 


= 1 . 


JT^M P{x) p{x) 

For any / G C°° (TM), we can define its average along fibres on TM, with respect 
to the conditional probability density functions, as the following function on M: 


(4.9) 


fix) = 


f {x, v)p{v I a;) dVx (v ), \/x G M. 


>T^M 


(4.10) 


Finally, for any 0 < a < 1 , define the a-normalized kernel 

ix,v-y,w) 


KZs ix,v;y,w) := 


Pf,s pZs iV’ ' 

We are now ready to define a family of hypoellitpic diffusion operators on TM as 


(4.11) 




ITM 


K“s (x, V] y, w) f (y, w) p (y, w) dp {y, w) 


/ ( t , V- y, w) p (y, w) dp (y, w) 

Itm 


for any f G C°° (TM). 

We are interested in the asymptotic behavior of Hf‘g in the limit e —?► 0, <5 —>■ 0. 
It turns out that this depends on the relative rate with which e and S approach 0. 
For simplicity of notation, let us write 7 = i5/e. 
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Theorem 4.1 (HDM on Tangent Bundles). Let M be a elosed Riemannian mani¬ 
fold, and Hf‘g : (TM) —>■ (TM) defined as in (4.11). If 6 = O (e) as e ^ 0, 

or equivalently if ^ = Sje is asymptotically bounded, then for any f G C°° {TM), 
as e 0 (and thus S ^ 0), 

(4.12) 


HT,sf{x,v) 


f{x,v)-\-€ 


m2i 

2mo 


[fp^ °‘] {x,v) 
{x, v) 


f{x,v) 


p^~°‘ {x, v) 


jTO22 

0 - - 

277 Z.Q 


A^ [/pi °] (x,i;) 

{x, v) 


f{x,v) 


^Vpl a (^x,v) 
p^~°‘ {x, v) 


+ 0{e^ + S^), 


where mg, m 2 i, TO 22 axe positive constants depending only on the kernel K. 


Theorem 4.1 is the tangent bundle version of Theorem 4.5 (which applies to unit 
tangent bundles), and the proofs for these two theorems are essentially identical. We 
included a proof of Theorem 4.5 in Appendix B, from which a proof of Theorem 4.1 
can be easily adapted. 


Proposition 4.2. Let M and be as in Theorem 4.1. If S = ye, then for any 
f G C°° {TM) and sufficiently small e > 0, 

(4.13) 


lim f {x, w) = / (x) + I 

7—>-00 ’ ' 


2mn 


Am [7p^ °] {x) AmP^ ° {x) 

pi-“ {x) ■' pi-“ (a;) 


0 \ 


where Am is the Laplace-Beltrami operator on M, p is the projected density function 
on M, f {x) is the average of f along fibres on TM, and TOq, are constants that 
only depend on the kernel function K. 


Corollary 4.3. Let M and Hf‘g be as in Theorem 4.1. // (5/e = 7 —)■ 00 as e —>■ 0, 
(5 —>■ 0, then for any f G C°° {TM), in general 


ix,v) - f {x,v) 

lim lim-^- 

7—>-oo 5—>-0 e 


, f {x,v) - f {x,v) 

^ lim lim -^- 

( 5 —vO 7—>-oo 6 


and thus an asymptotic expansion of (x^v) for small e, 5 is not well-defined. 

In faet, for each fixed 7 , as in (4.12), 

(4.14) 


HZ-yef =f{x,v) + 


_W21 

2too 


AH [/pi °‘]{x,v) A^pi °{x,v) 

J ix, V) 


p" 


{x,v) 


p" 


+7e 


W 22 

2?7in 


A'^ [/p^ “] (a;w) rf X A'^pi °‘{x,v) 
-- f[x,v) - 


P 


{x,v) 


p 


1-a 


{x,v) 


{x,v) 


O(e^) 


whereas by Proposition 4.2 
(4.15) 


Am [7p^-"] {x) 
pi-“ (x) 


fix) 


AmP^~" {x) 

pi-“ (x) 


+ O (e^) . 


Corollary 4.4. Under the same assumptions and notation as in Theorem 4.1, if 
0 = 1, then 
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(i) If S = 0(e) as e —>■ 0, then for any f G C°° (TM), as e —>■ 0 (and thus S ^ 0), 


(4.16) HlJ (t, v) = f (x, v) + A^/ (x, v) + 6^A^f (x, v) + 0 (e^ + S^) ; 

ZTUq ZTTlf) 

(a) For any f G C°° (TM), 


■ m22 .V. 


(4.17) 


Hl-yef {x, v) = f (x) + Am/ (x) + O (e^) . 


7—>-oo 


4.2. HDM on Unit Tangent Bundles. The construction of HDM for unit tan¬ 
gent bundles is very similar to the construction in Section 4.1. We only need to 
replace the volume element dfa on TM with dO, the Liouville measure on UTM 
(see, e.g., [21, Chapter VII]), and modify the definition of in (4.4) into 


(4.18) 


K(.,8 (x,v,y,w) := K 


d-li (x, y) 


dg 

Oy 


{Py, 


y,xL 


where Sy is the unit sphere in TyM, and ds^ (•,•) is the geodesic distance on Sy 
under the induced metric from TyM. Note that as defined in (4.18) is still 
symmetric. Abusing notation, we shall not distinguish the in (4.4) with the 
unit tangent bundle version (4.18), whenever the specification can be inferred from 
contexts. Similarly, notation 

(4.19) p(x):= [ p(x,v) dVj:(v), 


(4.20) 

and 


p(v\x) 


p(x,v) 

p(x) 


(4.21) 


f(x)= f(x,v)p(v\x)dVAv), yf GC^(UTM),yxGM 


will stay the same as in (4.7), (4.8), and (4.9). Like in (4.11), now we can define a 
family of hypoellitpic diffusion operators on UTM for any 0 < a < 1 as 


(4.22) H^J(x,v):=dM™ 

for any f G (UTM). 


( 2 ^’ '^) / (y> '^) p 


UTM 


Ms (x, v; y, w) p (y, w) dS (y, w) 


Theorem 4.5 (HDM on Unit Tangent Bundles). Let M be a closed Riemannian 
manifold, and : (7°° (UTM) -G C°° (UTM) defined as in (4.22). If 5 = O (e) 
as e —>■ 0, or equivalently if ^ = die is asymptotically bounded, then for any f G 
(UTM), as e ^ 0 (and thus 5 ^Q), 

(4.23) 




f(x,v) +e 


TO 21 

2too 


Af [fp^ (x,v) 
p^~°‘ (x, v) 


f(x,v) 


AgP^ “(t,t) 
p^~°‘ (x, v) 


Zttiq 


M [fp"~"] ix,v) 

(x, v) 


f{x,v) 


Mp^ °‘(x,v) 

(x, v) 


+ 0(eM6^), 


where TOq, m 2 i, m 22 are positive constants depending only on the kernel K. 
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We included a proof of Theorem 4.5 in Appendix B. The proof of Theorem 4.1 
is essentially the same. 


Proposition 4.6. Let M and be as in Theorem 4.5. If 5 = ye, then for any 
f £ C°° (UTM) and sufficiently small e > 0, 

(4.24) 

'Am [7p'-] (x) _ ^ ^ q 


lim m f {x, v) = f{x) + e 


7 —VOO 


2 mg 


p^-°‘ (x) 


(a;) 


where Am is the Laplace-Beltrami operator on M, p is the projected density function 
on M, f (x) is the average of f along fibres on UTM, and mg, m^ are constants 
that only depend on the kernel function K. 


Corollary 4.7. Let M and Hfg be as in Theorem 4.5. // i5/e = 7 —>■ 00 as e —>■ 0, 
(5 —>■ 0, then for any f G C°° {UTM), in general 


lim lim 

7—>-oo 5 —>-0 


- f{x,v) 
e 


^ lim lim 
(5—>-0 7 —>-oo 


- fix, V) 

e 


and thus an asymptotic expansion of Hf‘gf {x,v) for small e,S is not well-defined. 
In fact, for each fixed 7 , as in (4.23), 

(4.25) 


H^^-fef {X,V) =f{x,v)+€ 


W 21 

2mo 


Af [fp^ “] {x,v) 
p^~°‘ {x, v) 


f{x,v) 


Agp^ “(x,n) 
(a:, v) 


+7e 


TO 22 

2mg 


[V-"] {x,v) 
p^~°‘ {x, v) 


f{x,v) 


A'^p^ °‘{x,v) 
p^~°‘ {x, v) 


0{e^ 


whereas by Proposition 4.6 
(4.26) 

lim Hf‘ f {x, v) = f (x) + e^ 

7-^00 ’ ' ZUIq 


Am [fp^ °‘] jx) 

{x) 


fix) 


jx) 

f--°‘ (x) 


0 \ 


Corollary 4.8. Under the same assumptions and notation as in Theorem 4.5, if 
a = 1, then 

(i) If S = O (e) as € ^ 0, then for any f G C°° {UTM), as e —>■ 0 (and thus 

6 —>■ 0 ), 


(4.27) Hlgf {x, v) = f {x, v) + Af / {x, v) + A^/ {x, v) + O (e^ + 6^) 


2 mn 


(ii) For any f G C°° {UTM), 


(4.28) lim Hl^J {x, v) = f (a:) + e;^ Am/ (a;) + O (e^) . 

4 . 3 . Finite Sampling on Unit Tangent Bundles. Though the theory of hy- 
poelliptic diffusion maps on unit tangent bundles is completely parallel to its coun¬ 
terpart on tangent bundles, in practice it is usually much easier to sample from 
the unit tangent bundle since it is compact whenever the base manifold is. It thus 
makes much more sense to study finite sampling on unit tangent bundles. In this 
section, we first consider sampling without noise, i.e. where we sample exactly 
on unit tangent bundles; next, we study the case where the tangent spaces are 
empirically estimated from samples on the base manifold. The latter scenario is 
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a proof-of-concept for applying the hypoelliptic diffusion map framework to much 
more general fibre bundles in practice, where data representing each hbre are often 
acquired with noise. The proofs of Theorem 4.12 and Theorem 4.14 can be found 
in Appendix B. In Section 5, we shall demonstrate a numerical experiment that 
addresses the difference between the two sampling strategies. 

4.3.1. Sampling without Noise. We begin with some assumptions and definitions. 
Assumption 4.9 includes our technical assumptions, and Assumption 4.10 specifies 
the noiseless sampling strategy. 

Assumption 4.9. (1) i. : M ^ K-® is an isometric embedding of a d-dimensional 

closed Riemannian manifold into K^, with D ^ d. 

(2) Let the two-variable smooth function AT : be compactly sup¬ 

ported on the unit square [0,1] x [0,1]. The partial derivatives diK, d 2 K 
are therefore automatically compactly supported on the unit square as well. 
(In fact, a similar result still holds if K and its first order derivatives decay 
faster at infinity than any inverse polynomials; we avoid such technicalities 
and focus on demonstrating the idea, using compactly supported K.) 

Assumption 4.10. The {Nb x Np) data points 


Xi,Nf 

X2,Nf 


Xl,l, 3 ^ 1 , 2 , 

2 ^ 2 , 1 , 2 ^ 2 , 2 , 




XNb zNf 


are sampled from UTM with respect to a probability density function p{x,v) sat¬ 
isfying (4.5), following a two-step strategy: (i) sample Nb points ^i, - • • ,^Nb i-i-'i- 
on M with respect to p, the projection of p on M (recall (4.19)); (ii) sample Np 
points xpi,--- jXj^Np 012 with respect to p(- | ^j), the conditional probability 
density on the fibre (recall (4.20)). 

Definition 4.11. (1) For e > 0, d > 0 and 1 < < Nb, 1 < t, s < Np, 


dehne 



where : S^. —)■ S^. is the parallel transport from S^. to S^.. Note the 
difference between and (dehned in (4.18)): uses Euclidean 

distance while uses geodesic distance. 

(2) For 0 < a < 1, define 


N B NF 



i=i s=i 


and the empirical a-normalized kernel A“^ 



1 < bi < Nb, 1 < r, s < Np. 





22 


TINGRAN GAO 


(3) For 0 < a < 1 and / G C°° (UTM), denote the a-normalized empirical 
hypoelliptic diffusion operator by 

Nb Nf 

^ ^ Xj,s) f (Xj^s) 

Ksf i-.r) = -. 

-^£,(5 ®i.s) 

j = l s=l 

Theorem 4.12 (Finite Sampling without Noise). Under Assumption 4.9 and As¬ 
sumption 4.10, if 

(i) 6 = O (e) as e —>■ 0; 

(ii) 

Nf 

= p e (0, oo), 

Nq^oo _/V d 

then for any Xi^r with 1 < i < Nb and 1 < r < Np, as e —>■ 0 (and thus 6^0), 

with high probability 

(4.29) 


W21 


He,sf (Xi,r) = f iXi,r) + e„ 

Ittiq 


Af [/pi “] ^Afpi“(xi,^) 

pi-“ (a;*,.) ^ pi-“ (x,.,) 


- W22 

2TOn 


^s[fP^ ^ A^pi “ (xi.,.) 

pi-“ (x.,.) pi-“ (x,.,) 


+ O(e2+52 + 0-iw =£- 


O' 


where 


= 1 - 


. i - ti-i / 

1 + e'liO 'i W -— 

V iVn 


We give a proof of Theorem 4.12 in Appendix B.2. 


4.3.2. Sampling from Empirical Tangent Spaces. In practice, it has been shown 
in [81] that, under the manifold assumption, a local PCA procedure can be used 
for estimating tangent spaces from a point cloud; we are using PCA here as a 
procedure that determines the dimension of a local good linear approximation to 
the manifold, and also, conveniently, provides a good basis, which can be viewed 
as a basis for each tangent plane. To sample on these tangent spaces, it suffices to 
repeatedly sample coordinate coefficients from a fixed standard unit sphere; each 
sample can be interpreted as giving the coordinates of a point (approximately) on 
the tangent space. Parallel-transports will take the corresponding point that truly 
lies on the tangent space at ^ to the tangent space at (, another point on the 
manifold. This new tangent space is, however, again known only approximately; 
points in this approximate space are characterized by coordinates with respect to 
the local PCA basis at (. We can thus express the whole (approximate) parallel- 
transport procedure by maps between coordinates with respect to PCA basis at ^ 
to sets of coordinates at C; these changes of coordinates incorporate information on 
the choices of basis at each end as wells as on the parallel-transport itself. 
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Let US now describe this in more detail, setting up notation simultaneously. 
Throughout this section, Assumption 4.9 still holds. Let {^i, • • • ,^Nb} be a collec¬ 
tion of i.i.d. samples from M; then the local PCA procedure can be summarized as 
follows: for any 1 < j < Nb-, let • • • ,^jf, be its k nearest neighboring points. 
Then 

~ 1 ~ 5 ■ ■ ■ ) Ofc ~ Ci] 

is a D xk matrix. Let ATpcA be a positive monotonic decreasing function supported 
on the unit interval, e.g., the Epanechnikov kernel 

ATpcA (u) = (l - X[o.i]> 


where x is the indicator function. Fix a scale parameter epcA > 0, let Dj be the 
k X k diagonal matrix 


Dj = diag W ATpcA 


\/epCA 


I ATpcA 


V^PCA 



and carry out the singular value decomposition (SVD) of matrix XjDj as 

An estimated basis Bj for the local tangent plane at is formed by the first d left 
singular vectors (corresponding to the d largest singular values in S^), arranged 
into a matrix as follows: 


Bj = 


,(i) 


•• 


id) 


oDxd 


Note that the intrinsic dimension d is generally not known a priori] [81] proposed 
estimating dimension locally from the decay of singular values in Ej, and then take 
the median of all local dimensions to estimate d; [59] proposed a different approach 
based on multi-scale singular value decomposition. 

Once a pair of estimated bases Bi^Bj is obtained for neighboring points 
one estimates a parallel-transport from to T^jM as 

Oji := argmin [|0 - BJ SJ] , 
oeo(d) ^ 


where jl'Hjjg is the Hilbert-Schmidt norm. Though this minimization problem is 
non-convex, it has a efficient closed-form solution via the SVD of Bj Bj , namely 

Oji = UV^, where bJB, = is the SVD of bJB,. 

It is worth noting that Oji depends on the bases; it operates on the coordinates 
of tangent vectors under Bi and Bj, as explained above. Oji approximates the 
true parallel-transport (composed with the bases-expansions) with an error 

of O (cpca), in the sense of [81, lemma B.l]. 

We summarize our sampling strategy for this section (with some new notations) 
in the following definition. 


Definition 4.13. (1) Let {Ciu’’ jCmbI be a collection of samples from the 

base manifold M, i.i.d. with respect to some probability density function 
p G C°° (M). For each ^j, 1 < j < Nb, sample Np points uniformly from 
the (d — l)-dimensional standard unit sphere S‘^~^ in and denote the 
set of samples as = {cj^, ■ ■ ■ where each is a d x 1 column 
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vector. Using the basis Bj estimated from the local PCA procedure, each 
Cj^s corresponds to an “approximate tangent vector at denoted as 


:= B, 


We use the notation =5^ for the unit sphere in the estimated tangent space 
(i.e., the column space of Bj). Note that the are uniformly 

distributed on 

(2) By [81, lemma B.l], for any Bj there exists a D x d matrix Qj, such that 
the columns of Qj constitutes an orthonormal basis for and 

W^j - QjIIhs = ^ (epca) ■ 


We define the tangent projection from to the estimated tangent plane 
as 


Td. 1-^ Tn 


QjQj ‘^j,s 
QjQj '^j,s 


This map is well-defined for sufficiently small epcA, a^nd then it is an isom¬ 
etry. Its inverse is given by 


j,S ' ^ '^j,s 


BjBjrj^s 

BjBjrj^s 


Note that we have 


— ^^PCA 

for some constant C > 0 independent of indices j, s. Since we sample each 
S^j uniformly and the projection map Tj^g U.s is an isometry, the points 
{rjp, • • • ,Tj,7Vf} are also uniformly distributed on The points 



''" 1 . 2 , •' 

’ ’ J ^ 1 1 Np 


^ 2 , 2 , 

: '^2,Np 

Nb , 17 

'^Nb,2, ■■ 

; Nb ,Np 


are therefore distributed on UTM according to a joint probability density 
function p on UTM defined as 

p{x,v) =p{x), y{x,v)GUTM. 

As in Assumption 4.10, we assume p satisfies (4.5), i.e., 

0 < Pm ^Pix,v) =p {x) < Pm < oo, V {x, v) S UTM 


for positive constants Pm^PM- 

(3) For e > 0, (5 > 0 and 1 <i,j < Nb, 1 < r, s < Np, define 


( K \\OpCi,r - Cj^sV 

{Tip:Tj^s) = < \ e 5 

lo, 


j + j, 

i = j. 


where Oji is the estimated parallel-transport from T^^M to T^.M. 
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(4) For 0 < a < 1, define 

Nb Nf 

Qe,S {Ti,r) = ^ ^ (Ti,r, 'Tj,s) 


j=l s=l 


and 


T,. ,) = , 1 < f, j < iVs, 1 < r, s < Nf. 

^e,5 ^e,S 

(5) For 0 < a < 1 and / e C7°° (UTM), denote 


^Fsfir.,r) = 


Nb Nf 

f (Tj. s) 

j^l 5^1_ 

Nb Nf ‘ 

j = l S = 1 


Theorem 4.14 (Finite Sampling from Empirical Tangent Planes). In addition to 
Assumption 4.9, suppose 

(i) £pcA = O (N^ as Nb —>■ 00 ; 


(ii) As e ^ 0, S = O (e) and S ^ (^^pcA + / 

(in) 

Np 

Njr^oo ^ 

Then for any Ti^r with 1 < i < Np and 1 < r < Np, as e ^ 0 (and thus S ^ 0), 

with high probability 

(4.30) 


^e^sf = f + e 


^21 

2mo 


[fP^ “] _ rf- ^ Afpl “(Ei,r) 

/ V^,r) 


“ (Tip) 


P^ “ (Tip) 


. rn22 

2mo 


[fP^ “] (Tip) _ X “ {Ti^r) 

J Vtp) 


P^ “ {Tip) 


P^ “ {Tip) 


+ 0(e2 + 52+6»-iAr 5e-f + j- 


where 


= 1 - 


(e?CA + e^)) 


d d-i 

1 + e*d 4 



We give a proof of Theorem 4.14 in Appendix B.2. 


5. Numerical Experiments and the Riemannian Adiabatic Limits 

In this section, we consider a numerical experiment on SO (3), the special linear 
group of dimension 3, realized as the unit tangent bundle of the standard sphere 
in K^. We shall compare both sampling strategies covered in Section 4. 
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5.1. Sampling without Noise. In the first step, we uniformly sample Nb = 
2, 000 points • • • ,^Nb} the unit sphere S'^, and find for each sample point 
the Kb = 100 nearest neighbors in the point cloud. Next, we sample Np = 50 
vectors of unit length tangent to the unit sphere at each sample point (which in 
this case is a circle), thus collecting a total of Nb x Np = 100,000 points on 
UTS'^ = SO (3), denoted as 

{xj,s \1 <J<Nb,1<s<Np}. 


The hypoelliptic diffusion matrix H is then constructed as an Nb x Nb block matrix 
with block size Np x Np, and Hij (the (i, j)-th block of H) is non-zero only if the 
sample points are each among the i^s-nearest neighbors of the other; when 
Hij is non-zero, its (r, s)-entry (1 < r, s < Np) is non-zero only if and 

Xj^s are each among the Kp = 50 nearest neighbors of the other, and in that case 

2 \ 


Hij (r, s) = exp - 


(5.1) 


= exp 




lie.-?,- 




exp 










\ 


where the choices of e, 5 will be explained below. Note that for the unit sphere 
the parallel-transport from to can be explicitly constructed as a 

rotation along the axis ^i x Finally, we form the a-normalized hypoelliptic 
diffusion matrix Ha by 


(5.2) 


(r, s) = 


Hij (r, s) 


/Nb Np 

^ {r^rn)] ( XI ^ (r, n) 


^1=1 m=l 


\k—ln—1 


and solve the eigenvalue problem 

(5.3) (^D-^HaD-^'^U = UA 

where D is the (NpNp) x (NpNp) diagonal matrix with entry {k,k) equal to the 
fc-th column sum of Ha- 


NbNf 

D (k,k) = X {k,v ), 


and A is a diagonal matrix of the same dimensions. Throughout this experiment, 
we fix a = 1, e = 0.2 and choose various values of <5 ranging from 0.0005 to 50, and 
observe the spacing of the eigenvalues stored in A. 

The purpose of this experiment is to investigate the influence of the ratio 7 = (5/e 
on the spectral behavior of graph hypoelliptic Laplacians. As shown in Figure 4, 
the spacing in the spectrum of these graph hypoelliptic Laplacians follow patterns 
similar to the multiplicities of the eigenvalues of corresponding Laplacians on SO (3) 
(governed by the relative size of (5 and e). In Figure 4(a), i5 <C e, hence the graph 
hypoelliptic Laplacian approximates the horizontal Laplacian on SO (3) (according 
to Theorem 4.5), in which the smallest eigenvalues have multiplicities 1, 6,13, • • •; 
in Figure 4(b), (5 = O (e), hence the graph hypoelliptic Laplacian approximates the 
total Laplacian on SO (3) (according to Theorem 4.5), with eigenvalue multiplic¬ 
ities 1,9, 25, •••); in Figure 4(c), (5 » e, hence the graph hypoelliptic Laplacian 
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approximates the Laplacian on the base manifold 5'^ (according to Corollary 4.7), 
with eigenvalue multiplicities 1, 3, 5, • • •). Note that in Figure 4(c) we fixed e and 
pushed 6 to oo, which is essentially equivalent to the limit process in (4.24) rather 
than (4.25). Moreover, if in each figure we divide the sequence of eigenvalues by the 
smallest non-zero eigenvalue, the resulting sequence coincide with the eigenvalues 
of the corresponding manifold Laplacian up to numerical error. For a description 
of the spectrum of these partial differential operators, see [92, Chapter 2]. 



Figure 4. Bar plots of the smallest 36 eigenvalues of 3 graph 
hypoelliptic Laplacians with fixed e = 0.2 and varying d (sampling 
without noise) 


5.2. Sampling from Empirical Tangent Spaces. Similar to sampling without 
noise, we uniformly sample Nb points - ,^Nb} the unit sphere in the 

first step, then construct the LCs-nearest-neighbor-graph for the point cloud with 
Kb = 100; the only difference is that here Nb = 4,000. (This finer discretization 
is necessary since we know from Theorem 4.14 and Theorem 4.12 that sampling 
from empirically estimated tangent spaces results in a slower convergence rate for 
HDM on unit tangent bundles. For the same reason we choose a larger Np, see 
below.) Next, we perform local PCA (with Kpca{u) = e“®“ X[o.i]) in the Kb- 
neighborhood around each sample point and solve for Oij from the local PCA 
bases Bi, Bj whenever ^i,^j are among the ATs-nearest-neighbors of each other. We 
then sample Np = 100 points from the standard unit circle in for each 
and denote them as 

{cj,s \1 <J<Nb,1<s<Np}. 

The block construction of the hypoelliptic diffusion matrix H is similar to the 
noiseless case, but with non-zero Hij (r, s) replaced with 


(5.4) 


H,, (r,s) = exp (-MwJi ) exp ( 


= exp 


ll?j Oil I \\OjiCi^r 


Finally, we construct as in (5.2), set a = 1, and solve the same generalized 
eigenvalue problems (5.3) with fixed e = 0.2 and varying 6. As shown in Figure 5, 
the spacing of the spectrum of graph hypoelliptic Laplacians is quite similar to 
what was obtained in sampling without noise. 













































28 


TINGRAN GAO 



(a) S = 0.002 


Im 


(b) S = 0.042 



(c) (5 = 20 


Figure 5. Bar plots of the smallest 36 eigenvalues of 3 graph 
hypoelliptic Laplacians with fixed e = 0.2 and varying d (sampling 
from empirical tangent spaces) 


5.3. As-Flat-As-Possible (AFAP) Connections. The purpose of this exper¬ 
iment is to provide some insights into the embeddings introduced in (3.16) and 
(3.19). As mentioned in Section 3.3, the embedding resulting from hypoelliptic 
diffusion maps tends to map “similar points” (where the similarity is specified by 
the pairwise correspondences) on different fibres to points on a common “template” 
fibre that are close to each other with respect to the Euclidean space into which the 
embedding takes place. This is illustrated in Figure 6 using the HDM obtained from 
the unit sphere example in Section 5.2, with Nb = 4,000, Np = 100, Kb = 100, 
Kp = 50, e = 0.2, and S = 0.042. We pick an arbitrary point Xj^s on S^-, the j-th 



Figure 6. The vector field (near ^j) on determined by (5.5) 


fibre sampled from the unit tangent bundle, which stands for a unit tangent vector 
(see the black arrow in Figure 6) to the unit sphere at the j-th sample. On each 
fibre where I < k ^ j < Nb , we then look for 


(5.5) 


Pp 


Xk.T&Sllf, 




where Hi. is defined in (3.21), and we choose t = 1. The resulting collection of unit 
tangent vectors 


(5.6) 


r := J U [Pik,iNj^s I 1 < A: ^ j < Ab| 
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gives rise to a discretization of a section on the unit tangent bundle UTS'^; this 
discretization can then be extended to the entire by interpolation. Since the 
connection we used in this construction of HDM is the canonical Levi-Civita con¬ 
nection, the similarity between two points on different fibres are measured according 
to their deviation from being parallel along geodesic segments to each other; there¬ 
fore, each stands for the unit tangent vector in the fibre 5"^^ that is the 

closest to among all discrete samples {xk^i | 1 < ? < Np}. As shown in 

Figure 6 , near the vector field F (extended by interpolation) is approximately 
constructed from parallel-transporting Xj^s to its neighboring fibres along geodesic 
segments. HDM thus implicitly constructs vector fields on that are locally as 
close to a parallel vector field as possible, though we know from differential topology 
that, globally, there is no “truly parallel” unit-norm vector field on the manifold 
5^. In this particular example, the “as-parallel-as-possible vector fields” produced 
by HDM can also be interpreted as generated by a connection that is as flat as pos¬ 
sible (AFAP), or as close to trivial as possible. A related construction on triangular 
meshes can be found in [29], which relies heavily on the connectivity information 
stored in the mesh structure. It is worthwhile to note that our approach using HDM 
is fundamentally different, in that our computational approach uses only a random 
neighborhood graph of the point cloud constituted by approximate samples of the 
manifold, as opposed to a structured triangular mesh. 


5.4. An Excursion to the Riemannian Adiabatic Limits. The formulae (5.1) 
and (5.4) provide an alternative interpretation (other than the one given by the 
HDM framework developed in this paper) for our numerical experiments in Sec¬ 
tion 5.1 and Section 5.2, as follows. If we set 7 = 5/e, then 


exp 


= exp 




Pf 




Xj,'p X n 


\ 




\ 


and our numerical experiments can be understood as applying the standard diffu¬ 
sion map (with bandwidth parameter e) to the total manifold of SO (3), except that 
the total manifold is equipped with a family of metrics different from the canonical 
bi-invariant one. These metrics all rely on the splitting of the tangent bundle of 
by the Levi-Civita connection (as defined in Appendix A, see (A. 5)), and are formed 
by recombining the horizontal and vertical components of the Sasaki metric tensor 
using a parameter 7 > 0 that controls the relative weight of the two components. 
This is in contrast to the interpretation given by the hypoelliptic diffusion map 
framework: assuming 7 > 0 is fixed (implying S = O (e)), recall from Theorem 4.5 
that 


(5.7) 


lim 


HLJ (x) - f {x) 


m2i 

2too 



m22 

m2i 



fix), 


thus 7 controls the infinitesimal generator of the diffusion process in consideration, 
while the metric on the total manifold of SO (3) is fixed. (Though we do not fix 7 in 
our numerical experiments, the limit in (5.7) still provides insights for small values 
of e > 0.) This duality between metrics and infinitesimal generators, reflected in the 
change of the relative size of the bandwidth parameters, is a natural consequence 
from a differential geometric point of view: the Laplace-Beltrami operator on a 
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Riemannian manifold depends on the choice of the Riemannian metric tensor; while 
the bandwidth parameters are characteristics of the chosen diffusion map, they 
can equivalently be interpreted as deformations of the underlying metric tensor. 
We would also like to mention related work that investigated the link between 
bandwidth and kernel density estimation [16], as well as recent progress in analyzing 
diffusion kernels with data-dependent bandwidth [10]; their relation with HDM will 
be explored in more detail in future work. 

The decomposition of tangent spaces of the total manifold is not only an essential 
element in the HDM framework but also the source of the duality relation discussed 
above. In differential geometry, such a decomposition can be studied in the broader 
context of Riemannian submersions, the purpose of which is to study the index 
theory for a family of smooth manifolds (parametrized by a base manifold). It 
is then important to “blow-up” the horizontal component of the metric so as to 
extract the fibre information; the approach adopted there is formally similar to 
the metric deformation we ntilize in HDM, except that the parameter 7 multiplies 
the horizontal component of the metric tensor and sent to 00 in the limit process 
(known as the Riemannian Adiabatic Limit [13, 12]). Though there is thus a close 
relation between that approach and HDM, we emphasize that our main focus here 
is the spectral geometry of the fibres rather than their topological invariants. 

6. Discussion and Future Work 

This paper introduced hypoelliptic diffusion maps (HDM), a novel semi-supervised 
learning framework for the analysis and organization of a class of complex data sets, 
in which individual structures at each data point carry abundant information that 
can not be easily extracted away by a pairwise similarity measure. We also intro¬ 
duced the fibre bundle assumption, a generalization of the manifold assumption, 
and proved that under this assumption HDM provides embeddings for both the base 
and the total manifold; furthermore, the flexibility of the HDM framework enables 
us to view VDM and the standard diffusion maps (DM) as special cases. The rest 
of the paper focused on analyzing HDM on the tangent and unit tangent bundles of 
closed Riemannian manifolds, with convergence rate estimated for finite sampling 
on unit tangent bundles. These results provide the mathematical foundation for 
HDM on tangent bundles, and motivate further studies concerning both wider ap¬ 
plicability and deeper mathematical understanding of the algorithmic framework. 
We conclude this paper with a few potential directions for further exploration. 

1) HDM on General Fibre Bundles. We are interested in providing a more general 
mathematical framework for studying HDM on a wider class of fibre bundles. 
This is necessary and interesting, since data sets of interest to HDM (such as 
shape collections or persistent diagrams) are naturally modeled on fibre bundles 
that are more general than tangent and unit tangent bundles. The theory of 
shape spaces [64] is of particular importance in this direction, since the concepts 
of horizontal and vertical Laplacians are readily available in the Sub-Riemannian 
literature [63, 75, 5]. 

2) Spectral Convergence of HDM. The convergence results in this paper are point- 
wise; as in [8, 84], we believe that it is possible to show the convergence of the 
eigenvalues and eigenvectors of the graph hypoelliptic Laplacians to the eigenval¬ 
ues and eigenvectors of the manifold hypoelliptic Laplacians, thus establishing 
the mathematical foundation for the spectral analysis of the HDM framework. 
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Moreover, the hypoelliptic diffusion maps differ from diffusion maps and vector 
diffusion maps in that the fibres tend to be registered to a common “template”, 
which, to our knowledge, is a new phenomenon that is addressed here for the 
first time. 

3) Spectral Clustering and Cheeger-Type Inequalities. An important application of 
graph Laplacian is spectral clustering (graph partitioning). In a simple case, for 
a connected graph, the eigenvector corresponding to the smallest positive eigen¬ 
value of the graph Laplacian partitions the graph vertices into two similarly sized 
subsets, in such a way that the number of edges across the subsets is as small as 
possible. In spectral graph theory [23], the classical Cheeger’s Inequality pro¬ 
vides upper and lower bounds for the performance of the partition; recently, [4] 
established similar results for the graph connection Laplacian, the central object 
of VDM. We believe that similar inequalities can be established for graph hy¬ 
poelliptic Laplacians as well, with potentially more interesting behavior of the 
eigenvectors. For instance, we observed in practice that the eigenvector corre¬ 
sponding to the smallest positive eigenvalue of the graph hypoelliptic Laplacian 
stably partitions all the fibres in a globally consistent manner. 

4) Multiscale Analysis and Hierarchical Coarse-Graining. Multiscale representa¬ 
tion of massive, complex data sets based on similarity graphs is an interesting 
and fruitful application of diffusion operators [55, 28]. Based on HDM, one can 
build a similar theory for data sets possessing fibre bundle structures, providing 
a natural framework for coarse-graining that is meaningful (or even possible) 
only when performed simultaneously on the base and fibre manifolds. More¬ 
over, since the hypoelliptic diffusion matrix is often of high dimensionality, an 
efficient approach to store and compute its powers will significantly improve the 
applicability of the HDM algorithm. We thus expect to develop a theory of hy¬ 
poelliptic diffusion wavelets and investigate their performance on real data sets 
with underlying fibre bundle structures. 


Appendix A. The Geometry of Tangent Bundles 


In this appendix, we briefly summarize some preliminaries on the geometry of 
tangent and unit tangent bundles. For the Sasaki metric [78, 79], readers may 
find useful the expositions in [44, 31, 65], or jump start from [30, Exercise 3.2]; for 
the unit tangent bundle, some results are collected in [15, 14] and the references 
therein. We define horizontal differential operators by directly lifting vector fields 
from the base manifold to the fibre bundle, which in principle applies to any diffusion 
operators [37]. 

A.l. Coordinate Charts on Tangent Bundles. Let M be a d-dimensional 
Riemannian manifold, TM its tangent bundle, and n : TM ^ M the canoni¬ 
cal projection from TM to M. In a local coordinate chart (U;x^,- ■ ■ ,x‘^) of M, 
{d/dx^, ■ ■ ■ ,d/dx‘^} is a local frame, and we write the basis for T^M as 



A trivialization for TM on U can be chosen as 


(x, n) !->■ (x^, • • • , x"^, • • • , , X e U,v e TxM, 
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and we write 


{x,v) 


d 

’ dx‘^ 


{x,v) 


JL 


{x,v) 


d 

’ dv‘^ 


{x,v) 


for a natural basis for T(a, „)TM. It is immediately verifiable that 





d 

dxi 



= 0, j = ,d, 


where d'K(^x,v) '■ T{x,v)TM —>■ T^M denotes the differential of the canonical projection 
TT at {x,v) € TM. Note our usage of ‘ja;” and to distinguish tangent vectors 

in TxM or T(^x,v)TM. 

Even when a connection is not present, the vertical tangent vectors to TM are 
well-defined. It suffices to take the subspace spanned by tangent vectors along the 
z;-coordinates 


(A.l) 


VT(^x,v)M := span 


d 


dv3 


{x,v) 



Ker {dTT(x,v)) ■ 


We call the subbundle of TM consisting of all vertical tangent vectors the vertical 
tangent bundle 


(A.2) VTM ■= VT(x,v)M = KeT{dTT). 

(x,v)gTM 


This immediately gives 

T (TM) jVTM ~ tt*TM. 

Using the Levi-Civita connection on M, we can find another subbundle of T {TM), 
called the horizontal tangent bundle 

(A.3) HTM := HT^x,v)M 

(x,v)eTM 


where 


(A.4) HT(x,v)M := span < 

dx^ 

- (x) ^ 

^ dvd 




{x,v) 

(x,v) ) 


The symbols T^^- are the connection coefficients of the Levi-Civita connection, or 
the Christoff el symbols. The tangent bundle of T M splits into the direct sum of its 
horizontal and vertical components 


(A.5) T (TM) = HTM 0 VTM. 

The Sasaki metric is a natural metric [44] on TM. For two tangent vectors 
X,Y G T(^x,v)TM, choose curves in TM 

a:t^{p{t),u (t )), f : s ^ {q{s) ,w (s)), 

such that 

p{0) = q (0) = X, u(0) = w (0) = V, 
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and define 

(X, = {dTT (X), dTT (X)), + ( ^ (0) > ^ (0)) > 

where Du/dt and Dw/ds are covariant derivatives. Using the horizontal-vertical 
splitting of T (TM), this metric can be equivalently defined as 

(X, = {dn (X), dTT (X)), if X, X G 

(A.6) (X, X)(,,„) = (X, X), if X, X G 

(X, X)(,,„) =0 if X G X G Ur(,,,)M. 

In words, the Sasaki metric imposes orthogonality between horizontal and verti¬ 
cal tangent bundles, and adopts metrics on HTM and VTM induced from the 
Riemannian metric on M. 


A.2. Horizontal and Vertical Differential Operators on Tangent Bnndles. 

Let r denote the Christoffel symbols of the Levi-Civita connection on M. Define 
the horizontal lift operator ^ : T^M —>■ T(2; from the tangent space of M at 

X G M to the tangent space of TM at (x, v) G TM, by 


if 


(— 


ydxi 

/ dxi 

X/ 




{x,v) 


dyP 


{x^v) 


It is direct to verify that this definition is independent of coordinates, as for (A.4). 

if can be used to define “horizontal” first order differential operators on TM: 
just lift vector fields from M to TM. For instance, the gradient operator on M, 
denoted as V : C°° (M) —>• F (TM), is defined as 


(V/, v)g = df {v) for all v G TM, 


or in local coordinates 






A 

dx^ 


Note that here f is a smooth function on M. The horizontal gradient operator on 

TM can be defined using if as follows 

(A.7) 


V"/(x,x) = g*'= 


= »“<") 11 ? 




I 9x® 


df 


{x,v) 




{x,v)) 

— 

1 5x® 




{x,v) 


{x,v) t 


Similarly, for the Laplace-Beltrami operator on M 

1 
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(where |g| = Idet^j), its horizontal counterpart is 
A^f{x,v)= ^ 


yj\9{x)\ 


(— 

u 

ydx^ 

J . 


A 

)1 

ydx’^ 

J- 


(A.8) 






{x,v) j 


^/\g{x)\g^'^ {x) ( ^ 


r/3 / a 


{x,v) t 
k 


In a geodesic normal neighborhood centered at some fixed x G M, (a;) = 0 for 
all indices 1 <i,j,k< d. Consequently, (A.7) and (A.8) simplify as 

d 


(A.9) 

and 

(A.IO) 


= Eli 






(a:,!’) 


JL 

dx^ 


{x,v) 


A^f{x,v)=Y, 


d^f 


^ d {x^) 


{x,v) 




{x,v 


The definition of vertical differential operators does not depend on the Levi- 
Civita connection. The vertical gradient operator onTM is simply 


(A.ll) 


V''f{x,v)=g^'^ (x)^ 


{x,v) 

and the vertical Laplace-Beltrami operator on TM 

1 d 


A 

dv^ 


{x,v) 


A^ fix, v) = 


(A.12) 






{x,v) ^ 


= g^^ (x) 


d^f 


dv^dv^ 


{x^v) 


The coordinate independence of these vertical differential operators follows from 
the observation that the ^-components of the coordinates “behave like” the x- 
components under change of coordinates: 


(A.13) 


'A 

dx^ 


.dx’^ 

d 


= v^ — 

d 

dxi 

X 

dx^ 

X 

dx^ 

X 

dx^ 

X 

dx^ 

X 




dx^ 

dx^ 




dx^ 

dx^ 


dv^ dx^ dv^ dx^ 

dyk Q^k ’ Qyk Q^k ’ 


or equivalently, they have the same Jacobian. Therefore, the coordinate invariance 
of V and A is equivalent to the coordinate invariance of V'^ and A'^. For the same 
reason, the volume form on T^M 


(A.14) 


dVj; (v) = \/\gix)\dv^ A-■■ A dv^, 
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is also coordinate invariant, since the volume form on M 
(A. 15) dvoljvf (x) = ^/\g (a;)| dx^ A • • • A dx‘^ 

is well-defined. The volume element on TM with respect to the Sasaki metric is 
locally the product of the volume forms in (A. 14) and (A. 15). According to Liou- 
ville’s Theorem, this volume element is invariant under geodesic flows (see, e.g.,[30. 
Exercise 3.14]). Note that the existence of a volume element on a general fibre 
bundle is not guaranteed; the tangent bundle is special in that its total manifold is 
always orientable regardless of the orientability of its base manifold (see, e.g., [30, 
Exercise 0.2]). The volume element on the tangent bundle of M induces a volume 
element on the unit tangent bundle of M, as we shall see below. 


A. 3. The Unit Tangent Bundle as a Subbundle. The hypoelliptic diffusion 
map constructed in Section 3 works for both compact and non-compact manifolds. 
In practice, due to the constraint of finite sampling, we prefer to apply HDM to 
a compact object. Assuming M is compact is not sufficient, since TM is always 
non-compact. This motivates us to apply HDM to the unit tangent bundle UTM, 
a natural subbundle of TM with compact fibres. 

Following the notations used in Section A.l, a unit tangent bundle over a Rie- 
mannian manifold {M,g) is a subbundle of TM, with hbre over x G M consisting 
of the tangent vectors in T^M with unit length: 

UTM := S, := {v G T,M \ g, (v, u) = 1} C T,M, 

xGM 

where g^ (•,•) denotes for the inner product on T^M, defined by the Riemannian 
metric on M. Note that UTM is a hypersurface of TM, and thus induces a metric 
from that of TM. The volume form on UTM with respect to the induced metric, 
known as the Liouville measure or the kinematic density [21, Chapter VII], is the 
only invariant measure on UTM under geodesic flows. 

We can define the gradient and the Laplace-Beltrami operator on Sx with respect 
to the metric on UTM, denoted as and , respectively. The vertical spherical 
gradient and vertical spherical Laplace-Beltrami operator can be defined 
through and similar to (A. 11) and (A. 12): 

Vsfix,v):=VsJix,v), X G M, V G Sx, 

(A.16) 

f {x,v) ■.= As,„f {x,v), xGM,vGSx, 


where we implicitly identified / with its restriction on Sx when and are applied to 
it, as long as no confusion arises. 

In order to define the horizontal lifts of V and A from M to UTM, we take 
advantage of the fact that UTM is a subbundle of TM, and set 


Vf/(x,u) := f{x,v), 
^sf{x,v) := A^ f{x,v), 


where / {x, u) is an arbitrary extension of / from C°° {UTM) to C°° (TM). 

We can show that the definition in (A. 17) does not depend on any specific choice 
of extensions. In fact, from (A.7)(A.8) it is clear that the value of f,A^f at 
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{x, v) G Sx only depends on the data of / along the flow generated by vector helds 


(A.18) 


d 


ta/3 / \ a ^ 

{x,v) 


{x,v) 


j = 1, • • • ,d. 


Thus it suffices to show that such a flow, if started at a point {x,v) G UTM, 
will remain on UTM for all time. Consider a curve j : t TM starting at 
{x,v) G UTM that follows along the direction of one of the vector fields in (A.18), 
say the one indexed by j. Write 7 (<) in coordinates as 


7 ii) = (x (t) , V (t)) = (a;^ (f), • • • , x'^ {t) ,v^ (t), ■ ■ ■ (t)) . 

By construction, 

dx’^it) d dv’^it) d 


/«) = E 


k=l 

d 

dxi 


dt dx^ 


(x{t),v{t)) 


dt dv^ 




-rf,(x(t))u“ (t) 


(x(t),v(t)) 


d 

dv^ 


(x{t),v{t)) 


which implies 


dx^ (t) 
dt 


= 1 , 


dx’^ (t) 
dt 


= 0, A: 7^ j. A: = 1, • • • , d. 


and 


dv^ (t) 


=-^^j{x{t))v‘^{t), =0^k^j,k = l,---,d, 


by linear independence. In other words, v (t) is indeed the parallel transport of 
V (0) along curve tt o 7 : t i-a a; (t) on M, since v {t) satisfies 


dv^ (t) 
dt 


ATy- (x {t))v^ (t) 


dx^ (t) 
dt 


= 0 , 


A: = 1, • • • ,d. 


which is the same equation that defines the parallel transport on M, from v ( 0 ) and 
long the curve 7 ro 7 (t) = x (t). In other words, if 7 (0) = (a; (0), v (0)) G UTM, then 
7 (t) = (x (t) ,v (t)) will stay on UTM for all time, since the parallel transport is an 
isometry between tangent spaces and thus preserves the unit length of z; ( 0 ) = v. 


Appendix B. Proofs of Theorem 4.1, 4.5, 4.12, and 4.14 

B.l. Proofs of Theorem 4.1 and Theorem 4.5. We include in this section a 
proof of Theorem 4.5. The proof of Theorem 4.1 can be similarly constructed. The 
Einstein summation convention is assumed everywhere unless otherwise specified. 

Our starting point is the following lemma, in reminiscent of [27, Lemma 8 ] and 
[81, Lemma B.IO]. 

Lemma B.l. Let $ : K —> K 6 e a smooth function compactly supported in [0,1]. 
Assume M is a d-dimensional compact Riemannian manifold without boundary, 
with injectivity radius Inj (M) > 0. For any e > 0, define kernel function 

(B.l) » (AIM) 
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on M X M, where d\j (•,•) is the geodesic distance on M. If the parameter e is 
sufficiently small such that 0 < e < i/lnj {M), then the integral operator associated 
with kernel 


(B.2) 


{^eg){x):= <I>e{x,y)g{y)dvo\M{y) 

Jm 


has the following asymptotic expansion as e —>■ 0 


(B.3) {<I>,g){x)=e-^ 


mog {x) + ( Am5 (t) - ^Scal (x) g{x) \ +0 (e^) 


where mQ,m 2 are constants that depend on the moments of ^ and the dimension d 
of the Riemannian manifold M, Am is the Laplace-Beltrami operator on M, and 
Scal(x) is the scalar curvature of M at x. 

Proof. We put everything in geodesic normal coordinates centered at x G M. If 
dn {x, y) = r, let y have geodesic normal coordinates (s^, • • • , s"^) such that (s^) + 
• • • + {s'^)^ = r^. Note that 


(B.4) 


IM 


{x, y) g [y) dvoW {y) = 


$ 


[ ^ g (s) dvolM (s) 

dB^(o) V e y 


dll i^^y) 


9 {y) dvolM {y) 


where 

g{s)=g (s\ • • • , s"*) = go exp^ (s^ei +-h s'^Cd) , 

dvolM (s) = dvolM (s^, • • • ,s^) = dvolM (exp^ (s^ei + • • • + s^ea)) 

with {ci, • • • , Cd} being an orthonormal basis for T^M. A further change of variables 

1 W 

T 




~d ^ 

, s = r = —= 

Ve ve 


leads to 


(B.5) 


[ $ ) g (s) dvolM (s) = / 4* {r"^) 9 (v^ s) dvolM (v^ s) 

V e y Jbi(o) 

= [ $ (f^) g (i/e s\ • • • , s^^) dvolM (Ve • • • , Ve ■ 

Jb, fol 


Recall [70] that in geodesic normal coordinates 


dvolM (s^, ■ 


0 = 


1 - ^Rki (x) s^s‘ + O (r^) 


ds^ ■ ■ ■ ds'^ 


where Rki is the Ricci curvature tensor 

Rki {x) = 9^^Rkiij ( t ) . 

Thus 

(B.6) dvolM (\/e • • • , s'^) = 1 - (x) s'^s'+ O 

In the meanwhile, the Taylor expansion of g (s^, • • • , s'^) near x reads 

9{s\--- ,s^)=g{0) + ^{0)s^ + l^^i{0)s'^s^ + O{r^) 


■e^ds^ ■■■ds‘^. 
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(0) . 


and thus 

(B.7) ~g{V~e~s\--- ^= 9 (0) 

Combining (B.4)-(B.7) and noting that 

[ $(f2) . Ve-|^(0)s^ -e5d5i---(is‘^ = 0 

by the symmetry of the kernel and the domain of integration Bi (0), we have 

/ ’^f {x,y)g{y)dvo\M{y)= $ (f^) g (yes\ • • • , yes'^) dvolM (v^s) 

dg 

dsi 


! M 


'Bi(O) 


$ (r^) 


g{x) + yfe- ^{0)~s^ +e-\J^{0)s^s^ +o[eif^^ 


2 ds^ds'- 
1 - ^Rki (x) s^s’- + O 


e^ds^---dr 


d 

= 62 


g{x) $ ds + I 

JBi(O) 


1 d^g 

2 ds^ds^ 


(0) - ^9 (x) Rki (x)] [ $ (r^) s'^s'ds + O (e^) 

0 / dsifo) 


g{x) f ^(r^) ds + e(^-^-^iO)-^g{x)Rkk{xn f $ (f^) (s'=)^ ds + O (e^) 

Jbi(o) \^5(s'=) b J Jbi(o) 

where the last equality follows from the observation that 

[ $ (f^) s*^s' ds^ • • • ds'' = 0 if fc 7 ^ Z 

7bi(o) 

again by the symmetry of the kernel and the domain of integration Bi ( 0 ); the 
O term vanishes due to the symmetry of the kernel (as argued in [82, §2]). 

The constants mg,m 2 can now be explicitly characterized: 

mo := / $ (f^) ds^ • • • ds'' = / $ (f^) [ da = w''-^ [ $ (r^) ^''-^df, 

Jbi{o) Jo Jsi{o) Jo 

m 2 ■= $ (f^) (s'')^ ds^ • • • ds'' independent of fc S {1, • • • , d} by symmetry. 

Jbi{o) 

Finally, recall that in geodesic normal coordinates 
d q 2 ~ d 

= *«(-). E ^kk (^) — ScrI (3^) 5 


from which the conclusion follows: 




{x, y) g {y) dvoW (y) = e" 


! M 


mog (x) + em 2 ( ^ AmS {x) - ^Scal (a:) g{x)] +0 (e^) 


= e 2 


mog {x) + ( AmA (a;) - :jScal (a;) g{x) \ +0 (e^) 


1 , 


□ 

In order to study we need an expansion for the parallel transport term 

Py^xV- This is established in Lemma B.2. 
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Lemma B.2. Let M be a Riemannian manifold, x G M, v € T^M. In geodesic 
normal coordinates around x, the parallel transport of v along a geodesic 7 : t i-A 
expats (t G [ 0 , e], || 0 ||tj,m = IJ has the following asymptotic expansion: 

{Pexpje,x {v)y =v^ - (x) + (a:)) + O (t^) 

where Py^x '■ TxM —>■ TyM denotes the parallel transport from TxM to TyM along 
the geodesic segment connecting x and y. 

Proof. Let • • • ,s‘^} be a geodesic normal coordinate chart centered at a; € M. 
Assume v G TxM takes the expression 


= E^ 


dsi 


and let y : [0,e] —>■ TM be the parallel transported vector field along the given 
geodesic 7 

d 


d 




i=i 


7(t) 


Note that (s^) + • • • + (s^^) = t^. Recall that V being parallel along 7 means 

d 


0 = (t) = Y^[^{t)+Y^^ it) (t) n, (7 (t)) I £j 


dY 


d 


i=i 


dt 


k,l=l 


dt 


dxi 


7(i) 


or equivalently 

dVi 
dt 


(B. 8 ) 


PL, fjryk 

(i) + E ^ (t) R' {t) Til (7 (t)) = 0 for all j = 1, • • • , d. 

k,l^l 


Let (t; 0^, • • • , 0^^) be the geodesic polar coordinates corresponding to (s^, • • • , s'^). 
By iteratively using (B. 8 ), 

Rf (0) = 

dV^ 


d 


^ (0) = - E ^ (0) (0) ni (7 (0)) = 0 

kd^l 


d 


(0) = - E " 


dt‘^ 


k,l^l 

d 


dt 


i=0 
d 


dj^ 

dt 


(t)y'(t))ri.,(7(o))- ^ ^(o)^'(o) 

^ k,l^l 


^ kl 


t^O 


= - E E (0)) (^)+RkJ (^)) 


k,l=l (T = l 

where the last equality for —^ follows from a simple calculation of Christoffel 

dP 

symbols, as follows. Recall that in geodesic normal coordinates 
g^J = S,j + ^Rikij {x) s'"s'- + O (t^) 

hence 

^^,g^J = {x) s‘ + ^Riky.j (x) s'"+0 (f) = ^s'" {R^ykj (a:) + R^k^,j {x))+0 (f^) . 


(7 it)) 
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Plugging these partial derivatives into the expression of Christofeel symbols to 
obtain 

P'^/ — 2^ ^ “t“ digki> dygkl) 

= \[5^^+0{t^)]^ 

— I^S {^Rlkau (^) H“ Rlakv (^) “t” Rklav (^) “t“ Rka-lif (3^) Rkual (^) Rkaul (^)) ~\~ O {t 

= [2Ri,k. (X) + 2Rk.i, (a;)] + O (f) = (r^J {x) + R^J (x)) + O (t^) , 

therefore 

(x) = ^ (x) + R,^J (x)) + O (t) 

which verifies the last equality for (0). Therefore, we have the following 

Taylor expansion for (t) up to the second order: 

it) = [r,J ix) + R^J (x)) + O it^) , 

which establishes the desired conclusion. □ 


Remark B.3. It is also useful to note that 

(i) = ^4 _ (^) + + O (t^) . 

where (s^, • • • , s'^) are the geodesic normal coordinates of y in the proof of Lemma B.2. 
Moreover, it is not hard to obtain higher order terms in the asymptotic expansion. 
For instance, differentiating both sides of (B. 8 ) twice, we have 
(B.9) 


dt^ 


( 0 ) 


V — 


k,l^l 

d 

E 

kd^l 


dt‘^ 

dt‘^ 








it) P' it)) Vi, (7 ( 0 )) - ^ ^ ( 0 ) p' ( 0 ) 


dt 


k,l = l 


dt 


dt'^ 


ndiit)) 




2 E T 

2-^ di 








_d 

dt 


ni hit)) 

t=0 


Note that P^; (7 (0)) = 0 since the coordinate system is geodesic, the first term in 
the right hand side of (B.9) vanishes. In the meanwhile, since in geodesic normal 
coordinates the parametrization of geodesic 7 is linear, we have 

( 0 ) = 0 , k = l,--- ,d. 

Combining this observation with the computation 
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we conclude that the last term in the right hand side of (B.9) also vanishes. There¬ 
fore, (B.9) is left with 




( 0 ) 



k,l^l 


dt 


(o)y' (0) 


dt"^ 


rL (7(0) 


0'^v‘d^^rii (x) {x). 


We can compute the second order derivative of the Christoffel symbol at x (see, 
e.g., [43]) and completely determine the O (0) term; but this is less crucial for our 
application. The only point in carrying through the computation of the third order 
derivative of {t) is that the O (t^) term in the expansion of Lemma B.2 is a third 
order homogeneous polynomial in the geodesic coordinates (s^, • • • , s"^): 

(0) = (x) = (x ), 

an observation that is necessary for dropping the higher order error term in Lemma B.4 
from O ^£5^ to e^, as we will see below. 

Armed with Lemma B.l and Lemma B.2, we are ready to take a step at analyzing 
K^^s- Lemma B.4 starts our investigation of kernel functions incorporated with 
parallel-transports. Virtually it only deals with K^ s with <5 —0, but we’ll soon see 
that it opens the door for understanding much more general cases. 


Lemma B.4. Following Lemma B.l, let Py^x '■ T^M —>■ TyM denote the parallel 
transport from T^M to TyM determined by the Levi-Civita conneetion on M. For 
any function f G C°° (TM), as e —>■ 0, 


(B.IO) 


'M 


{x, y) f (y, Py.xv) dvolM (y) 


= e’' i rnof (x,v) + e 


m2 


1 


A f (x,v) --Scal(x} f (x,v) 


O 


where is the horizontal Laplacian on TM defined in (A. 8 ). 


Proof. Consider the geodesic normal neighborhood around x G M, with e > 0 
sufficiently small such that a geodesic ball of radius ^/e centered at x is contained 
in this neighborhood. The integral is actually supported only on such a geodesic 
ball, due to the compact support of $ 

[ (.X, y) f {y, Py,xv) dvo\M (y) = f $ \ j Py,xv) dvolM (y) 

Jm Jm \ £ / 

= f $ {x,y) \ j, cJvoIm (y) ■ 

Jb^,(x) \ e J 

Express y G (x) in geodesic coordinates (s^, • • • ,s‘^), with 

(s^)^ -b • •• -b {s‘^f =r^ =dli (x,y), 
and put y into polar coordinates 

y = exPa; rO, 6 G TxM, || 6 »||,j, = 1. 
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Recall from Lemma B.2 that the j-th coordinate component (j = 1, • • • ,d) of Py,xV 
is 


b 

1 

II 

{dilak (a^) + Rkal' i^) 

) + O (r^ 

= ( 
6 ' 

[Riak + RkJ (a;)) 

+ O (r^) 

A further change of coordinates 



... 

.d _s\ r 



’ “ V^’ V-e 



leads to 

(BJl) 

/ 


$ 


dll y) 


f (y, Py,xV) dvolM (y) 


/ ^ ^ (v) ^ ■ ■ ■ ’ ■5'^’ (.Py,xV)^ G ■ • , {Py,xvf^ dvo\M (s\ ■■■ ,s‘^) 


= J ^ ^ (r^) f ■ ■ ,e^^s‘^,iPv,xv)^ ,■ ■ ■ ,{Py,xvf^ dvolM {s^,-■ ■ ,s‘^) 

where / denotes for / in these geodesic normal coordinates, 

i.Py,xvy =v^ - {RiJ {x) + (a;)) + O 


and 


dvolM (s\-- - ,s"*) = 


1 - -^Rki (a:) s^s’- + O (r^) 


ds^ ■ ■ ■ ds‘^, 


dvolM (s\ ■■■ ,s‘^) = 1 - -Rki (x) + O 


yd) 


■ £2 • • • ds‘^. 


Taylor expanding / at (0, v) S T^M in these coordinates, we have 
,e^s'^,{Py,xvf ,■■■ ,{Py,xvf^ 


.df 


df 


= f{0,v) + €^p—{0,v)+ {Py,xvy-v^ —{0,v) + -ss 


ds^ 


J dv^ 


~j d^f 
ds^ds^ 


( 0 , 1 ;) 


1 r. 


+ 2 [^Pv,xv) - V J \^iPy,xV) - V 




dv^s"^ 


The rest of the proof follows from simply substituting this Taylor expansion into 
(B.ll) and integrate term-by-term. For simplicity of notations, let us write 


Too := 


:= [ $ (f=2) ds^ • • • ds^. 

Jbi{o) 


By the symmetry of the domain of integration, 

f $ (f2) S^dS^ • • • ds'^ = 0, j = 1, • • • , d, 

dBi(O) 

f $ (f^) s^s^ds^ • • • ds‘^ = 0, k y I, k,l = 1, - ■ ■ ,d, 
Jb, fO) 


















HYPOELLIPTIC DIFFUSION MAPS I: TANGENT BUNDLES 


43 


and 


m2 := 



(f^) j 


I,- - • ,d 


are constants independent of super-indices 1 < j < d. Following a direct computa¬ 
tion, 


[ $(r2)/(0,u) \l-^Rki{x)s>^rs^ + o(eh^) 

dBi(o) L 6 V y 


e2ds^---ds^ 


= e 2 f{x,v) mo — e—^Scal (x)-I-O (e^) 

L 6 

[ ^{f^)eh^^{0,v) \l-^Rki{x)s’^s^+o(eh^) 

dBi(o) ^ ' ds^ I 6 V y. 

f $ (f) s^ds^ ■ ■ ■ ds'^ + O (e^ 
Jb^O) ^ 


d 1 df 




• e^ds^ • • • ds^^ 
= e^-0 (e") 


[ $(f2) fl-£p,,(a;)5'=S' + ofeif3') 

JBi{0) J OtP L b \ / 


■e^ds^---ds^ 


d 

= 62 


i?0 (x) + O (e^) 


'Bi(O) 




(.ip) 


• e 2 ds^ • • • ds" 


= £2 


m 2 9^/ ! ^ ^ 2 \ 


2 fe=i9(s'=) 


1 r 




'Bi(O) 


= £^• 0 ( 6 ^), 

92 / 


^(O--) [l-5RHWi''5‘ + 0 


(eif3) 


eUs^-'-ds^ 


J iPy,xvy-v^ s'^-^-^{0,v) 1-^Rkiix)s’"y+ o(^€^f^'j ■ ds^ ■ ■ ■ ds 


= ei-0{e^), 


where all O ye ^ j terms drop out as in the proof of Lemma B.l, thanks for Re¬ 
mark B.3. Combining these computation, we have 


! M 


{x, y) f {y, Py,xv) dvolM (y) 


J (^) iPy,xvy , • • • , {Py^xvf'j dvolM (s\ • • • ,s‘^) 


= e^if{x,v) mo — e —^Scal(x) 
L 6 




2 t^idis^y 
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= eilf{x,v) Too - e^Scal(a;) 


+ e 


m 2 




+ 0 (e" 


d 

= 62 


|mo/ (x, x) - e^/ (x, v) Seal (x) + e^A^f (x, v) + O (e^) | 

+ 0 (e^) 


m 2 


= i mo/(x,x) + e — 


A /(x,x) --Scal(x)/(x,x) 


where in the second to last equality we used the expression of in geodesic 
normal coordinates from (A. 10). □ 

The following Lemma B.5 is the unit tangent bundle version of Lemma B.4. 

Lemma B.5. Following Lemma B.l, for any function f G C°° (UTM), as e 0, 

{x, y) f {y, Py,xv) dvolM {y) 


(B.12) 


! M 


= £2 ,jmo/(x,w) + e^ 


/ {x, v) - ^Scal (x) / (x, v) 


Oi 


where Ag is the horizontal Laplacian on UTM as defined in (A. 17). 

Proof. First of all, note that by the metric compatibility of the Levi-Civita connec¬ 
tion, Py 3, : T^M — > TyM is an isometry, which descends naturally to an isometry 
from Sx to Sy. For any function / G C°° (UTM), let / denote its extension to the 
whole TM, as in (A.17). By Lemma B.4, 


/M 


{x, y) f {y, Py,xv) dvo\M (?/) = (x, y) f {y, Py,xv) cZvoIm (y) 


IM 


= e" \ mo/ (x,x) + I 


m 2 

~Y 

(x, v) - ^Scal (x) / (x, v) 

+ 0 (.»)j 

m 2 

f ( 2 ^ 12 ;) - iscal (x) / (x, v) 

+ 0 (.»)j 


□ 

Based on Lemma B.5, we have the following Lemma B. 6 , which carries most of 
the work for proving the first part of Theorem 4.5. 

Lemma B.6. Assume M is a d-dimensional closed Riemannian manifold with 
Inj (M) > 0. For any function g G C°° (UTM) and sufficiently small e G ^0, Inj (M)^^, 

<5 = 0 (e), 

(B.13) 

K^^s(x,v;y,w)g{y,w) dQ{y,w) = / / (x,v;y,w) g {y,w) day (w) dvoln (y) 

JmJs^ 


I UTM 


= imog{x,v) -G (Agg{x,v)- ^Scal (x) g (x, x) 


, .m22 ( /.v ( \ (d- 1) (d- 2) 

+ 5 (a:, - 5 - 


5(x,x)^ +0(e^ + S^) 
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where TOo,m2i,m22 are positive constants, day is the volume element on Sy, and 
Ag , Ag are the horizontal and vertical spherical Laplace-Beltrami operators on 
UTM as defined in (A.16)(A.17). 


Proof. By definition, 
(B.14) 


/ / Ke,s (x, v; y, w) g {y, w) day (w) dvolM {y) 

JmJs„ 

( dir ix,y) dg^ {Py^xV,w) 


MJS,, 


g {y, w) day (w) dvolM (y) ■ 


Since S = O (e), 5 —)■ 0 as e —)■ 0. Applying Lemma B.l, 

^ T^(dM{x,y) dl {Py,xV,w)\ ^ 

K I -, -- g [y, w) day [w) 


d-1 

= (5 2 I Mr 


e ’ (5 

dli (x, y) 


5_Mjdli{x,y) 


giy,Pv,xv) 


^s9 (y> Py,xv) - -Scar" iPy,xv) g [y, Py,xv) 


o(s^ 


where Seal'®" (•) is the scalar curvature of Sy, and Mg (r), M 2 (r) are functions of a 
single variable determined by the following integrals over the unit ball in 

Mo{r^)=[ K {r'^,p^)d9P--de‘^-\ 

M2{r^)= [ {e^fK{r‘^,p^)de^---de‘^-^ 

with 

Moreover, for any y G M, since {TyM, || • |jy) is isometric to the standard d- 
Euclidean space (this can be seen by simply fixing an orthonormal basis in TyM), 
Sy equipped with the induced metric from |j • ||y is also isometric to the standard 
unit {d — l)-ball in As a result. 

Seal®" {w) = {d — l){d— 2) for all y G M, w G Sy. 


Thus 

(B.15) 




d\[{x,y) dg^{Py,xV,w) 


g{y,w) day (w) 




S_Mf dl,ix,y) 


,u„/.. r, (d—l)(d—2) 


^s9iy^Py,xv) - 


9iy,Py,xv) 


0(6^ 
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It remains to apply Lemma B.5 multiple times to (B.15) to obtain 
/ / {x, v; y, w) g (y, w) day (w) dvolM (y) 

JMJSy 

d-M y) w) 


K 


IM 


= S 2 


Mn 


IM 


g {y,w) day {w) 


dvolM (y) 


d-M (x, y) 


9 {y, Py,xv) dvohi (y) 


-Mo 


IM 


= 0 2 <( £2 


dh i^^y) 


^s9 {y, Py,xv) - -{d-l){d-2)g (y, 


mog{x,v) + e^^ ( Afy (x, 4 ;) - ^Scal (a:) y (x, u) ) + O (e^) 


dvolM (y) + O (i5^) > 


, d .m22 ( .V ! \ {d-\)[d-2) 

+ £2 A^y(a;,4;)--- 


= £ 2 ^ 2 moy (xju) + ( Afy (a:,w) - ^Scal(x)y (a;,?;) 


y (a;,u)^ +£ 2-0 (£^ + 5^) | 


, ;-W22 I /.v i X (d- 1) (d- 2) 
+ A^y (a;, u) - ---- 


gix^v)^ +0{e‘^ + 6^) |, 


where mo, mi, m 2 are constants determined by the following integrals of Mq (r^) 
or M 2 (r^) over the unit ball in 


mo = I Mq (r^) ds^ ■ ■ ■ ds^, 

JBfiO) 

m2i = [ Mq (r^) (s^)^ ds^ ■ ■ ■ ds‘^, 
JBfiO) 

W22 = / M2 (r^) ds^ • • • ds'^ 

JbHo) 


where 


/Bf( 0 ) 

/ 

iBfiO) 

/ 

iBfiO) 

= (s^)%... + (s^)^ 


□ 


Proof of Theorem f.5. As 6 = O (e), direct application of Lemma B .6 gives 
(B.16) 

Pe,s{x,v)= / K^^s{x,v]y,w)p{y,w)dei{y,w) 

JuTM 


lUTM 

d _ d-1 


= £ 2(5 2 mop(x,u)+ £^^ ( Afp(a;,'!;) - iscal(x)p(a;,a;) 


, .^22 / , y / X (d- 1) (d- 2) 
+ AIp{x,v) --- 


p{x,v)^ +0[e^ + (5^) 
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In order to expand the denominator of (4.22), note that 


cud _ q(1) 


= ^ ' {'mop{x,v) + {/\sp{x,v) - ^Sca\{x)p{x,v) 


, ^ (d- 1 ) (d- 2 ) 

+ ( A^p(a;,z;)-^- 


p{x,v)^ + O (e^ + S^) 


= f 2 


^ cc{d-l) 


5 2 {x^v) I \ — 


am2i f A^p{x,v) 1 


2too V p{x,v) 3 


— -Seal (x) 


/ A^p (x,v) _ (d - 1 ) (d - 2 ) \ ^ / 2 j 2 \ | 

2 too \ p(x,v) 3 J ^ J ’ 


and hence by Lemma B .6 


lUTM 


Ke,s (x, v; y, w) p {y, w) d0 {y, w) 


/ / K^^s{x,v;y,w)p^f {x,v)p^g iy,w)p{y,w)day{w)dvolM iy) 

JmJs^ 

077121 f A^p{y,w) 1 


_g^ o.(ri-l) _Q, _Q, , , 

= e 2 J 2 ^0 PeS 


/ / K^^six,y]v,w)p^ °‘{y,w) 

JmJs,. 


1-e 


o 1 / N -q Seal (7/) 

27770 V P{y,w) 3 


-d 


a77722 / (^, 77 ;) _ (d- 1 ) (d- 2 ) 

2too V P{y^w) 3 


^ +0(e2+j2) 


(l-Q)d (1-Q;)(d-1) 

= e ^ 6 2 X 


777oP^ “ (X, 7 ;) 
, 77721 


/Afp(^ 1 \ 

‘ Sm, ( p{.,v) 3®“*™) ^ 


0777 22 /Agp(x,v) (d—l)(d—2) 

2 r? 7 o \ p (x, v) 3 


Afp^ “ (x, 7 ;) --Scal(x)p^ ix,v) 


+ S 


77722 


A^p^-“ (x,7;)- 


(d-l)(d- 2 ) , 


X “(x,7 


+ 0{e^ + d^) 


(l-a)d (l-a)(d-l) , , 

= ^ —^77iy“p-“ (x, v) pi-“ (x, v) X 

/ 1 _ ^ *^"^21 / Afp (x,x) _ 1 A _ ^ 077722 / A^p (x, x) _ (d - 1) (d - 2) 

I ^ 2?77o V p{x,v) 3 / 2mo V P{x,v) 3 

+ _ Iscal(x)) +s”^ 

2mo \ pi““(x, w) 3 / 2mo \ p^~°‘{x,v) 3 

+ O (e^ + d^) I 

(l-Q)d (1-Q;)(d-1) T T 

= e 2 ^ 2 rnl ^p^^{x,v)p^ ^{x,v)x 
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1 + e 


^ r^AfpM Iscal (x) 

2 mo p^~°‘{x,v) p{x,v) 3 


+ <5 


TO22 

2mn 


“ (a;, v) _ ^ A'^p{x,v) 


+ 0{e^ + S^) . 


p^ “ {x, v) p{x,v) ' ' 3 

A similar computation expands the numerator of (4.22): 

/ K^,s (a:, y, V, w) f {y, w) p {y, w) dcr^ (w) dvoW (y) 

JUTM 

= K^^six,y]v,w)p~g{x,v)pJgiy,w)f{y,w)p{y,w)day{w)dvolMiy) 

JMJSy 

= {x,v) X 

/ / K,^s{x,y]v,w) f {y,w)p^-’^ {y,w) 1 - - ^Scal{y)\ 

JMJSy [ 2mo \ p{y,w) 3 J 

_ ^ am 22 / {y,w) _ (d - 1) (d - 2) \ ^ / 2 j 2 \ 

2mo \ p{y,w) 3 J ^ 


(l-a)d (1-Q:)(d-1) 

= € ■^ 5 2 TOp>^ “ (a;,u) X 


mof{x,v)p^ {x,v) 

mi 


1-e 


ctm 2 i ( Agp{x,v) 1 


2mo V P{x,v) 


— -Seal (x) ) — S 


,077122 /A^p(x,u) (d—l)(d—2) 

2mo \ p (x, n) 3 




Af [/p^ “] (a;,7;) - -Scal(x) [fp^ “] (x,x) 


[/pi-“] (x,x) - [/pi-“] (x,x) 


+ O (e^ + d^) 


(l-°)d ,(l-c)(d-l) 


= e' 2' d' 2 'toJ “p^“(x,w)pi “(x,x)x 

r, 1 am2i . [ A^p{x,v) 1 \ 077122,. /A^p(x,x) (d-l)(d- 2 ) 

f{x,v)-e- - f{x,v)[^ -r--Scal(x) -d-- f {x,v) < 

Zmo \ p(x^v) 3 / zmo 


V P(a:,w) 


^21 


""^Vfx rl ^Scalfxl^ I d""'" f fx x) f (d-l)(d-2)' 


+ 0{e^ + S^) 


a-a)d ,(l-o)(d-l) 


= e 2 d 2 rug °‘p^g{x,v)p^ “(x,n)x 


/(a;,a^) +e^/(x,7;) 


Af [/P^ “] (a:,u) A^p{x,v) 1 ,, , 

[/pi-“](x,x;) p{x,v) 3^ ^ 




As [/P^ “] A^p (a^’ _ (1 _ (d- 1) (d - 2) 


[/pi-“] (x,w) 


p(x,x) 


+ 0(e2 + d2) 1. 
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Combining expansions for denominator and numerator, we conclude that 

luTM '^) f (y^ '^) p (y^ do {y, w) 


HZsf ( 2 ^’ '^) = 


JuTM y^ p (y^ de (y, w) 




- i (1 - a) sc.1 (.) 

[JP \{x,v) p{x,v} 3 


r ^22 „ / ^ 

+ - f{x,v) 

2mo 


[fP^ ^^P{x:v) {d-l){d-2) 

- — a—^ (1 — aj---- 


1 + e 


[/pi °‘]{x,v) p{x,v) 

Afpi-“(a:,u) A^p{x,v) 1 


+ 0(e 


+ (5 


m2i 

2mo [ pi““(x,u) 
m22 




A[^pi “(a;,u) A^p(ai,'(;) (d-l)(d-2) 

-—3 


= /(a;,u)< 1 + e 


2mo [ pi “ (ic, v) 
W 21 


+ 0(e2 + <52) 


2too 


/P (a;,f) A^p{x,v) 1 ^ 

■ rr i-ai/ —“ I-^-o 1 - a) Seal {x) 

/pi °‘]{x,v) p{x,v) 3 


+ (5 


-(5 


W 22 

2mo 

W 21 

2too 

1X122 


As [fp^ °‘] {x,v) A%p{x,v) (d-l)(d-2) 

[/pi-“](x,u) “ p(x,u) ^ 3 

Afpi-" (x,v) Afp(x,v) _ 1 ^ ^ g 1 . / 

p^-(x,v) “ p(t,u) 3^^ «)Scal(x) 

'A[^pi"“ (a;,u) _ ^ A^p (a;,u) _ (d- 1) (d-2) 


2mo [ pi “ (x, v) 


p{x,v) 


+ O (e^ + ^2) 


= /(x,u) 1 + 


.W 21 

2TOn 


Af [/p^ “](a^,'e) Afpi “(x,u) 


[/pi-“] (a;,u) 


pi““ (t, u) 


+ (5 


W 22 

2mn 


As [/P^ “] A^pi “(a:,u) 


[/pi-“] (x,u) 


pi““ (x,v) 




+ 0(e^ + 6^) 


= fix, v) + e 
+ S 


m2i 

2mo 

1X122 
2 mo 


Af [/P^ “] Afpi “(x.u) 


pi““ (x, v) 

A^ [/pi-“] ix,v) 


pl““ (x, v) 


- fix, v) 


pi““ (a;, u) 
A[^pi-“ (a;, u) 


pl““ (x, v) 


+ O (e2 + ^2) . 


□ 


A proof of Theorem 4.1 can be composed with a similar direct computation. 
The only prerequisite is to establish a tangent bundle version of Lemma B.6, using 
Lemma B.4 instead of Lemma B.5. Similarly, a proof of Proposition 4.2 can be 
derived from the following proof of Proposition 4.6. 

Proof of Proposition 4-6. To establish (4.24), first note that 

lim pe-ye (a:, u) = lim / / ix,v,y,w) p iy,w) day iw) dvo\M iv) 

x^°° JmJs„ 
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= lim 

JmJs, 




d\j{x,y) ds^iPy,xV,w) 


e je 


lim K 

ImJs^ 


dli (x, y) ds^ {Py,xV, w) 


K 


mJs^ 

= 1K 


dlf ix,y) 


7e 

,0]p{y,w) day (w) dvolM (y) 


p (y, w) day (w) dvoW (y) 


p {y, w) day (w) dvoW {y) 


dll V) 


piy,w) day {w) 


dvolM {y), 


since Py^xV does not depend on 7 . Recall from (4.19) that 


and define 


Ke {x,y) = K 


p{x)= p {x, v) dVx (v), 

ds^ 

-i7 ( dh (x, y)\ _ ( dh (x, y) 


= K 


0 , 


Rr“ {x,y) = 


Ke jx^y) 

Pe {x)rAyy 


Then 

(B.17) 


{x,v) = 


7—>-oo 


f a 

r 




/ {x,y) 

/ 

f {.y,w)p{y,w) day {w) 

dvoW (y) 

JM 

as. 




f kAx 

JM 

y) 

/ P (y, w) day (w) 
_JSy 

dvolM (y) 


{x,y) 


IM 


f[y,w) -jr^day [w) 


p {y) dvolM (y) 


K't {x,y)p{:y)dYo\M [y) 


f M 


(x, y) f (y) p (y) dvolM (y) 


IM 


Ke y) p (y) dvolM (y) 


IM 


By [27, Theorem 2], as e —^ 0 
'{x,v) 

Ke (x, y) 7 {y) p (y) dvolM (y) 


lim Hf f {x,v) 

7—>-00 ’ ' 


IM 


(B. 18 ) 


Ke {x,y)p{y) dvolM {y) 


IM 




Am [fp^ “] (x) AmP^ “ (a;) 

/ [Xj —1 — 1 


(x) 


(x) 


0{e 
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where 


L 


sf(0) 


K (r^) • • • ds‘^, 



sf(0) 




and we again dropped the higher order error term from O 
in [82, §2], 


^£ 2 ^ to O (e^), as argued 
□ 


B.2. Proofs of Theorem 4.12 and Theorem 4.14. To prove the two finite 
sampling theorems, we’ll follow the path paved by [7, 45, 82, 81]. 


B.2.1. Sampling without Noise. The following lemma builds the bridge between the 


geodesic distance on the manifold and the Euclidean distance in the ambient space. 

Lemma B.7. Let l : M ^ be an isometric embedding of the smooth d- 
dimensional closed Riemannian manifold M into For any x^y € M such 

that dM (x, y) < Inj (M), we have 



where 9 S T^M, ||0||,j, = 1 comes from the geodesic polar coordinates of y in a 
geodesic normal neighborhood of x: 


y = expire, r = dM {x, y) ■ 


Proof. See [85, Proposition 6]. 


□ 


The reason we need Lemma B.7 is due to the fact that the hypoelliptic diffusion 
operator in (4.18) is constructed using geodesic distances on the manifolds, whereas 
in practice only the Euclidean distance in the ambient space is observed. In order to 
prove Theorem 4.12, it is convenient to introduce the “Euclidean distance version” 
of the hypoelliptic diffusion operators. Note that in Definition 4.11 the hat is 
used for empirical quantities; for the remainder of this appendix, the tilde will 
be used for quantities in hypoelliptic diffusion operators that replace the geodesic 
distance with the Euclidean distance. These quantities include^ 



Pe,s {x,v) 


i 


UTM 

Ke,s ix,v;y,w) 


Ke,s {x, v; y, w) p {y, w) dQ {y, w ), 


{x, V] y, w) 


Pe,s ix^v)plgiy,w)' 


and eventually 



^Note that in this subsection Ke,S is not much different from ^“5, since they are both con¬ 
structed from Euclidean distance and exact parallel-transports. They will represent quite different 
quantities in next subsection, where is constructed from estimated parallel-transports. 
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The next step is to establish an asymptotic expansion of type (4.23) for We 
deduce the following Lemma B. 8 , the “Euclidean distance version” of Lemma B.l, 
from Lemma B.7 and Lemma B.l itself. 

Lemma B.8. Let $ : K —>■ K 6 e a smooth function compactly supported in [0,1]. 
Assume M is a d-dimensional closed Riemannian manifold isometrically embedded 
in with injectivity radius Inj (M) > 0. For any e > 0, define kernel function 


(B.20) 


{x,y) = $ 


|a;-yir 


on M X M, where H-H is the Euclidean distance on If the parameter e is 

sufficiently small such that 0 < e < -^Inj (M), then the integral operator associated 
with kernel 


(B.21) 


^,g){x):= <^^{x,y)g{y)dvolM{y) 


IM 


has the following asymptotic expansion as e —>■ 0 

m2 


(B.22) 

with 


{^eg)(x) 


d 

= 62 


mo 5 (x) + e— (AmS (x) A E {x) g (x)) + O (e^) 


E (x) = -^Scal (x) + ( 2 ;) 


where mo, m 2 are constants that depend on the moments of ^ and the dimension 
d of the Riemannian manifold M, Am is the Laplace-Beltrami operator on M, 
Seal (x) is the scalar curvature of M at x, and A (x) is a scalar function on M that 
only depends on the intrinsic dimension d and the second fundamental form of the 
isometric embedding l : M ^ . 


Proof. From Lemma B.l, 
A 


$ 


IM 


dii 


g (y) dvolM (y) 


mog (x) + ^AmS (x) - ^Scal (x) g (x) j + O (e^) 


thus we only need to expand 


(B.23) 


IM 


$ 


X - ; 


- $ 


dh {x,y) 


g (y) dvolM (y) • 


Put y in geodesic polar coordinates in a geodesic normal neighborhood of x S M, 
y = exp 2 ,r 6 », r = dM {x,y) ,9 e T^^M, || 6 »||^ = 1 , 
and denote the geodesic normal coordinates around x as (s^, • • • , s'^). By Lemma B.7, 


- dlt (x, y) = -^dlj (x, y) ||n (6», 6»)f + O (x, y)) 
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thus 

(B.24) 

$ f 1 


■ - y\ 




dji 

e 


= 


dli i^^y) 


X - 


dh {x,y) 


O 


= 


, f dh {x, y) 


-^/M{^,y)moM?]+o 


;-j/f dh{x,y) 


dh i^^y) 




Recall that <& is supported on the unit interval, which implies that in (B.23) only 
those y G M satisfying ||a: — y\\ < \/e or dM {x,y) < are involved. According 
to Lemma B.7, for sufficiently small e > 0, \\x — y\\ < \/e implies du {x,y) < 
which means that the higher order error in (B.24) is indeed 


O 


dh {x,y) 


= O 




= 0(eh . 


Therefore, 

(B.25) 


Jm 


$ 


■-y\\‘ 


- $ 


dh y) 


9 (y) dvolM (y) 


1 

1 




, / dh {x,y) 


IM 


dh (x, y) ||n (6», 6»)f g (y) dvoW (y) + e" . O 


/ (v) lln(6i,6»)f y (y)dvolM (d) + £= • O (e^) . 


In geodesic normal coordinates (s^, • • • , s'^), 

(B.26) 

l!n {9, 9)\\'^ g (y) dvoW (d) 

= [ <i>'(-)rh\U{e,e)fg{s)\l-lR,i{x)sh^ + 0{rh 

Jb^M Vey [6 

As in Lemma B.l, we Taylor expand y (s) around s = 0 

~g{s\--- .s'") =y(0) + ^(0)s^ + i^^(0)s'=s' + 0(r3), 
and note that by symmetry 


ds^ ■ ■ ■ ds'^. 




'B^M 


/||n(d, 0 )f s^ds^---ds‘^ = 0 , j = ,d, 

thus (B.26) reduces to 

f (-)rh\Ii{9,9)fgix)ds^---ds‘^ 

Jb^Ao) V e / 

+ f (—\r'^\\U{9,9)\fO{rhds^---ds‘‘ 

dB^rAo) V ^ / 
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where 


(B.27) 


f f—) r'^\\Il{9,e)\f g{x)ds^ ■ ■ ■ds‘‘ 

^B^(o) V e / 

= gix) [ $' (7^) • e‘^f^ ||n (6», 0)f ■ eUs^ ■ ■ ■ s‘^ 

Jbi(0) 

= €i-e^g{x)[ \\Il{e,e)\fd9 [ (f^) f4+(<i-i) 

JSi{0) Jo 

= ei-e^g{x)[ \\U{9,9)fd9[ $'(f^) f3+d^f, 
Js^(0) Jo 


and 


(B.28) 


[ ) r'^\\U{9,9)fO{r^)ds^---ds‘^ 

dB^(o) V e / 

= [ $' {r^) ■ ||n {9, 9)f ■ eO (r^) ■ eUs^ ■ 

Jbbo) 


/Bi(0) 

= ei-0{^). 

Recall from the proof of Lemma B.4, we adopted notation 


7712 = 


JBi{0) 


thus 


m2d= f ^{f‘^)r^ds^---ds‘^ = Wd-i f $ (f^) 

Jb, (O') Jo 


/Bi(0) 

where LOd-i is the volume of the standard unit sphere of dimension (d — 1). Denot¬ 
ing 

|2 


Aix) = — f ||n(d,d)ir 

Wd-i ds, (O') 


d9 


i-l JSiiO) 

as the average of the length of the second fundamental form over the standard unit 

sphere, we can write (B.27) as 

(B.29) 


i-e'^g{x)f \\U{9,9)fd9 f (f^) 
dSi(O) Jo 


= e 2 • e^g (x) A (x) uJd-i ■ ( — d (d -I- 2) ) = —• e^ '-^d {d + 2) g (x) A (x), 


2ujd- 


2™2 


where we integrated by parts 


$' (7^) f3+d^~ f (^) f (^) 




i+f 


?=i 

5=0 


- 1+A 




d-f 2 


$ (72) r'''+id7 = -,P^d {d + 2). 
Jo ^^d-1 
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Combining (B.29), {B.27), (B.28) with (B.26), we conclude that 


IM 


$ 


'-y\\ 


- $ 


d-M i^^y) 


9 (y) dvo\M {y) 


d 

= 62 


d 

= 62 


— • ~^d {d + 2)g {x) A(x) + O (e^) 

('^ + 2) ^ (a;) 9ix) + 0 (e^) 


which establishes 


{^e9)i 


x) = {x, y) g (y) dvolM (y) 

J M 


= / $ 
JM 


dh {x,y) 


9 (y) dvolM (y) + 


J M 


$ 


■-yir 


- $ 


dh i^^y) 


9 (y) dvohi (y) 


d 

= £2 


d 

= £2 


with 


TOoy (t) + £^ f Am5 (t) - -Seal (x) g {x) + —d {d+2)A (x) g{x)] +0 (c^) 


mog (x) + (AMy (x) + E (x) g (x)) + O (e^) 


E {x) := - -Seal {x) + —d {d + 2) A {x). 


n 


Remark B.9. The only difference between the conclusions in Lemma B.8 and 
Lemma B.l is that the scalar function E (x) takes the place of the scalar curvature 
Seal (a;); one can check, essentially by going through the proof of Theorem 4.5, 
that this change does not affect the conclusion of Theorem 4.5. Specifically, in 
that proof the same Seal (a:) from the numerator and denominator cancel out with 
each other in the asymptotic expansion, and this cancellation still occurs if one 
replaces Seal (a;) with E (x). In fact, by applying Lemma B.8 repeatedly we have 
the following expansions for f,g G C°° {UTM) 


(B.30) 

and 


{Xj y) f (y, Py,xV) dYO\M (y) 


fM 


d 

= 62 


'UTM 


{x,v) + [Ah f {x,v) + El (x) f {x,v)] +0(£^)|, 

Ke,s (x, v; y, w) g (y, w) dQ (y, w) 


— T ^ ^ I / \ ^^21 r A fJ / \ / \ / \1 

(B.31) =e "02 i^mog[x,v) + e—[Asg{x,v) + Ei{x)g{x,v)\ 


+ [^s9{x,v) + E2 ■9 {,x,v)\ +0{e^ +5h 


where 


r. lo , d{d + 2) 1 f 

ddi (Ci) — — xScalM ih) 4-— ■- / 

P IZ UJd-1 JSiiO) 


Um {0,0)fde 
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only depends on the scalar curvature ScuIm and the second fundamental form IIm 
of the base manifold Mat and 


E 2 = --Seals 


(d-l)(d+l) 1 


12 


Wd -2 JSi(O) 


\\Us {o,0)fde 


is a constant because 


Seals = (d — 1) (d — 2), ||ns (d, d)|l^ = 1 for any unit tangent vector 9. 


These expansions are essentially the equivalents of Lemma B.5 and Lemma B.6 
for K^^s- Using (B.30) and (B.31), and picking d = O (e) as e —>■ 0, a version of 
Theorem 4.5 holds true when is replaced with i.e., as e —)■ 0 (and thus 
S —)■ 0), 

(B.32) 


Ksfi^ 


f{x,v) +e 


TO 21 

2too 


Af {x,v) 

(x, v) 


f{x,v) 


p^~°‘ (x, v) 


+ d 


W 22 

2mo 


As [/P^ 

(x, x) 


f{x,v) 


A^p^ “(x,x) 
(x, v) 


+ 0{e^ + 6^). 


As we shall see below, this observation is the key to establish estimates for the bias 
error in the proof of Theorem 4.12. 


The last missing piece for the proof of Theorem 4.12 is a large deviation bound for 
our two-step sampling strategy. Recall from Assumption 4.10 that we first sample 
Ng points ■Cl; ■ ■ ■ Anb i-i.d. with respect to p on the base manifold M, and then 
sample Np points on each fibre i.i.d. with respect to p(- | ^j). The resulting 
Ng X Ng points on UTM 

^l,2j * * * ; Xl^pfp 

X2,l, X2,2, ■■ ■ , X2 ,Nf 


XNbPj ^iVij,2j * * * ) 

are generally not i.i.d. sampled from UTM. This forbids applying the Law of Large 
Numbers directly to quantities that take the form of an average over the entire unit 
tangent bundle, such as 

^ Nb Nf 

JY ^ ^j,s) f (Xj^s) ■ 

^ ^ 1 = 1 s=l 

However, due to the conditional i.i.d. fibrewise sampling, it makes sense to apply 
the law of large numbers to average quantities on a fixed fibre, e.g., 

, Np 

" S = 1 

where stands for the expectation with respect to the “fibre component” of the 
coordinates of the points on . Explicitly, 


^ ' .Re,(5 (Xi^r^ Xj^s) f {Xj,s) 


Ez 


Ez 


Re,<5 {Xi,r, (?!,•)) / (Cl, •) 



Re,5 {Xi^r,{^J,w)) f {fj,w)p{w \ fj) (w) . 
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Next, note that • • • , ^Nb i-i.d. sampled from the base manifold M, the partial 
expectations 

[Ez [k,,s (0, Z)) f 

are i.i.d. random variables on M with respect to p. Thus 

Nb 


Explicitly, 




IE\ 


Ez 


{x,,r,{y,Z))f{Y,Z) 


Ey 


E; 


K„(r,z))/(r,z) 


= / piy) K^^sixi^r,{^j^w))f{y,w)p{w\y)day{w)dvo\Miy) 


IM 




Ke,s ixi,r, i^j,w)) f {y,w) [p (y) p {w I y)] day (w) dvolM (y) 

MJ Se- 

^.7 

Ke ,5 {Xi,r, f {y, w) p {y, w) day (w) dvolu (y) ■ 

I MJ Se. 

This observation suggests the following iterated limit process 

N B iVjT 


1 


lim lim 

>-00 Njr—>-00 NqNf 


^ ^ ^ ^ ^e,5 {^i,n f (^_7,s) 


MJSc. 

^.7 


j = l S^l 

Ke,s {Xi^r, {^j,w)) f {y, w) p {y, w) day (w) dvolM (y), 


and the two limits on the left hand side generally do not commute. 

For this reason, it is natural for us to consider iterated partial expectations rather 
than total expectation on the entire UTM. For simplicity of notation, let us denote 
Ey,Ez as Ei,E 2 respectively, see the following definition. 


Definition B.IO. Let p be a probability density function on UTM, and 
p{x)=f p{x,w)daa;{w), p{v\x) = 

P [x) 

as defined in (4.7)(4.8). For any function / e C°° (M), define 

El/:= / f (y) p (y) dvolM (y) ■ 

Jm 

For any function g € (5^) for ^ € M, define 

^i9-= [ 9{C,w)p{w \0da^{w). 

JSi 

Definition B.ll. Let p be a probability density function on UTM. We call a 
collection of Nb x Np real-valued random functions 

{Xj,, \ l<j<NB,l<S<NF} 

Procrustean with respect to p on UTM, if 
(i) For each 1 < / < Nb, the subcollection {Xj^g | 1 < s < Np} are i.i.d. on S^. 
for some S M, with respect to the conditional probability density p (• | ^j); 
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(ii) The points {^j | 1 < j < Nb} are i.i.d. on M with respect to the projected 
probability density |5 (•)• 

Due to (i), we denote for simplicity of notation 

:= „ El^Xf := E^Xl^, 

and for the same purpose, due to (ii), 

E1E2X := EiE^^Xj, El {E 2 Xf := Ei . 

Lemma B.12. Let {Xj^s | 1 < j < Nb, 1 < s < Np} be a collection of Procrustean 
random functions with respect to some density function p on UTM. If 


< ^0, 


eI^x, 


< Ml, IE 1 E 2 XI < M 2 a.s. for all 1 < j < Nb, 1 < s < Np, 


then for any t > 0 and 0 < 9 < 1, 


{ . Nb Np 'I 



1 


(1 - ey Npy 


eI^x] - 




+ i(Mo + Mi)(l-6»)t 


1 


9^NBt^ 


+ exp < - - 


El (E2X)2 - (EiE2X)2 


1 


(Ml +M2)0t 


Proof. Note that 


1 


Nb Nb 




EE — IExIE2-^ ^ t 


Nb Nb 


1 






NbNf 


1 

NpNp 


Nb Nb . Nb \ \ 

E E E ^ E - E 1 E 2 X > t 


j = l S = 1 

Nb Np 


i=i 

Nb 


i=l 


Nb 


E E A,. - E Ei' A > (1 - 9) d U i is; E A - E.I 


j=l s=l 


i=i 


Nf. 


i=i 


{ Nb Np .. Nb \ 

- NNn.s -VEfx, > (l-9)t\ +P<^ -Ve^^X,-E iE 

NbNp ^ ^ TVs ^ 2 ^ M 1 fVs ^ ^ ' 

2 = 1 8=1 J = 1 ) I J = 1 


EfXj - E1E2X 


=: (I) + (II), 

where 9 S (0,1) will be fixed in specific applications. Since 

< Ml +M 2 , 

by Bernstein’s Inequality [24, §2.2], 

{ Nb 

E] (e^’Xj - E 1 E 2 X) > 9NBt 


I 2 X > 9t 

\2X > 9t 
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< exp < — 




r 1 2 1 

El - E1E2X + - (Ml + M2) 9NBt 

- - - 3 


= exp 




Nb [Ei (E2X)" - {EiE 2 X f] + J (Ml + M2) 0NBt 

-I O 


= exp < - 


-e^NBt 


El (E2X)^ - (EiE2X)^j + J (Ml + M 2 ) ( 

-I O 


For (I), note that 


{ Nb / -. Np \ ) Wb r , Afir 

E E j > (1 - ^) < E PI E 

NsfNp 'I 

= E p E 

j=i U=i J 

and applying Bernstein’s Inequality to each individual term in the summation yields 


i\p 1 

J2 (A.« - > (1 - ^) Npt E exp E 




< r i2 1 

^ E« \x,^s - + - (Mo + Ml) (1 - 0) Npt 


= exp < - 


(1 - 9f Nj,f 


Nf eI^X^ - ^ i (Mo + Ml) (1 - 6>) IVFt 


= exp < - 


and thus 


-(l-d)^IVM^ 


’X] - (e«^X,)" + i (Mo + Ml) il-9)t 


(I) < ^ exp E 


Ei^X^ - 


ii-9yNFf 
\21 1 


+ - (Mo + Ml) {1 — 9)t 


Remark B.13. Intuitively, the second term in the bound stems from the sampling 
error on the base manifold, and is thus independent of 6 and Np] the hrst term in 
the bound comes from accumulating hbrewise sampling error across all Nb fibres. 
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Proof of Theorem 4.12. We shall first establish the result for a = 0. In this case, 
(•: •) = (t): and 

Nb Nf 

EE ^e,(5 (^2,r5 ^j,s) f 

Ksf i-rr) = — - 

EE 


1 


1 = 1 s=l 

Nb Nf 


NbNf 


EE^ 

1 = 1 s = l 


lie* OIP W^ij, Xj^s 




1 


NbNf ^ ^ \ e " S 

Since {a;j,s}^j^ are i.i.d. with respect to p (• | ^j), by the law of large numbers, for 
each fixed j = 1, • • • , Nb, as Nf —t oo, 

Nf 


lim -E K 

7-1 ]\fr~i • ^ 


Nf—>-oo Np 


ll^i~?llP W^ij, iiXi,r Xj^s 


K 


IlCi Clip \\^ij,^iXi,r 


f (Xj^s) 


f{^j,w)p{w I (w ), 


Note that {CilE are i.i.d. with respect to p, again by the law of large numbers, 
as Nb —t oo. 


^ Wb ^ Nf 

lim } lim } K 

A^s—^oo Np Nf^OCi Np 


= / v{v)-— f 

JM OJd-1 JS( 


s=l 


IICi~CllP WP^j.iNi.r Xj^s 


K 


IIC*-y|P \\Pv,iiXi,r-w\\ 


K 


fUTM 


e ’ 5 

2 li D ^ ..,I|2 ^ 


/ (Xfs) 


f {y, w) da^. (w) p{w\y) dvoW (y) 


IlCi-ylP \\ Py,iNi,r - w 

e ’ (5 


f {y,w)p{y,w) dO {y,w ), 


where we used p {y, w) = p (y) p{w \ y). Setting / = 1, 


1 f 

lim —V lim — V if 
Nb^oo Nb ^^ Nf^oo Nf ^ ^ 

1 = 1 s = l 


IlCi Clip \\Pij,iiXi,r Xj 


K 


lUTM 


lie* -2/IP \\Py,i,X^,r - W\\ 


p{y,w)de {y,w). 


Therefore, 


lim lim H°sf{x^^r) 

Nb^oo Nf^oo ' 


K 


>UTM 


IICi-2/lP \\PyX,X,,r - w\\ 


f {y,w)p{y,w) de {y,w) 


K 


lUTM 


IICi-2/lP \\Pv,i,X^,r - W\\ 


p{y,w)dQ {y,w) 


= Hlsf {Xi,r) 
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— f “1“ 6 


W21 r Af [fp] (Xj^r) 

2mo [ pixi^r) 


f {Xi,r') 


^SP i^hr) 

P{x%,r) 


+ <5 


TO22 

2mn 


UP\ 

PiXi,r) 


f {Xi^r) 


V-' 


Pix^^r) 


O 


where in the last equality we used the assumption 5 = O (e) as e —>■ 0, as well as 
the observation in Remark B.9 that Theorem 4.5 holds true when K^g is replaced 
with K^g. The bias error is thus O (e^ + S^). 

It remains to estimate the variance error for the special case a = 0. Write for 
any fixed Xi^r & UTM 

P'j,s — {Xi,r^ ^3,s') f {Xj^s) : ^j,s — {Xi^r^ ^j,s) • 


Note that = 0, Gi^s = 0 for all s = 1, • • • by Definition 4.11 (1). Also, by 
the compactness of UTM we have some trivial bounds uniform in j, s: 


In these notations, 


Halloo II/IIOO. |G,„|<||i^||^. 


r 1- TjO f ( \ 

hm hm H gf{xi^r) = , 

Nb—^ooNf^oo lli,xlli<2^ 


and we would like to estimate 

p{Nb,Nf,/3) := . 


^j^s^3,s E 1 E 2 A 

■E 1 E 2 G 


>/3 


for sufficiently small f3 > 0. An upper bound for 

E 1 E 2 F 

\ J2j J2s ^3,s E1E2G 

can be obtained in a similar manner. 

Since Gj^s are all positive, 


<-/3 


p{Nb,Nf,p) =: 


(E, E« f ,,,) E1E2G - (e, E. g,..) E1E2F 


E, EsG,.. E 1 E 2 G 


>/3 


^ ^ E 1 E 2 G - ^ 51 ‘^xs EiE 2 T’ > /3 ^xs E 1 E 2 G 


Denote 


E-« := E,.EiE 2 G - G,-.EiEai^ + ^ (E1E2G - G,,,) E1E2G, 
then it is easily verihable that EiE 2 E,s = 0 for all 1 < j < Nb, 1 < s < Np, and 


p{NB,Np,j3) =: 


1 


NbNp 


5]5]E-. >/3(EiE2Gf 


By Lemma B.12, bounding this quantity reduces to computing various moments. 
Define 


A, :=E2E, 
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then Xi, - ■ ■ , XjVs are i.i.d. on M with respect to p, and ¥.iXj = 0 for 1 < j < Nb- 
Furthermore, Xi,-- ■ ,Xj^^ are uniformly bounded. To find this bound explicitly, 
note that 


\Xj\ = lEaFjl = KEaFjOEiEaG- (E2 G,)EiE2F + /?(EiE2G-E2G,)EiE2G| 
< |(E2Fj)EiE2G| + |(E2G,)EiE 2F’| +/3(EiE2Gf +/3|E2 Gj| IE 1E2GI, 


and recall from Lemma B .8 and Remark B.9 that 

E 1 E 2 F = O , E 1 E 2 G = O , 

E2Fj = O ^ , E 2 GJ = O ^ , 

thus 

|Xj| < CeU‘^-^ + (3 

where C is some positive constant depending on the pointwise bounds of K, p, and 
/. Since we will be mostly interested in small /3 > 0, let us pick /3 = O (e^ + 6“^) 
and write the upper bound on \Xj \ as 

(B.33) \X,\<CeU'^-\ G = G(||iF|U,||/||^,p™,pM)>0. 

We then need to bound EiX|. Note that 


EiX| = El 


(E 2 F,)" (EiE2G)VEi (E 2 G,)'' (E 1 E 2 F)" 


-2Ei[(E2F,)(E2Gj)](EiE2F)(EiE2G)+/32Ei (E 1 E 2 G - E 2 G,)" (E 1 E 2 G)" 
+ 2/3 (E1E2G) Ei{ (E1E2G - E2G,) [(EaF^-) E1E2G - (E2Gj) E1E2F] } 

El (E 2 F,)"] (E 1 E 2 G)" + [Ei (EaG,)^] (E 1 E 2 F)" 

- 2 Ei [(E2F,) (E2G,)] (E1E2F) (E1E2G) + /32 (E1E2G)" [Ei (E2G)" - (EiE 2 G)^ 
+ 2/3 (E1E2G) [Ei (E2G)^ E1E2F - (E1E2G) El (E2 FjE2Gj- 


it suffices to compute the first and second moments of E2FJ , E2Gj for 1 < j < Nb- 
By (B.31), 

E 1 E 2 F = |mo [fp] {x,^r) + (Af [fp] {x,^r) + Ei {ii) [fp] {xi^r)) 

+ 5^- (A 5 [fp] (Xi^r) + E 2 ■ [fp] {Xi^r)) + O (e^ + (5^) l^, 
and if we set / = 1 , then 

E 1 E 2 G = ei ^^mop (xi^r) + (Agpixi^r) + Ei (fi) p {xt^r)) 

+ S 2 {^'sP (^i,r) + E 2 P (Xi^r)) + O + (5^) . 

Before we turn to computation of second moments, let us introduce another nota¬ 
tion: write the conditional probability density function on fibre as 


p(x,v) p(x,v) 
p(v[x) = — - 


p(x) po7r(x,v) 


P 


po TT 


(x.v). 
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where tt : UTM —>■ M is the canonical projection from UTM to M. This will help 
us avoid creating new notations for the base and fibre components of the coordinates 
of Xi^r, as needed \np{v \ x). Applying Lemma B.8 once, we have 


EaFj- = 5"^ |mo ( ^ 


fp 




+ Im2 ( 




fp 


p O 7T 


po TT 


fp 

po TT 


(Pi.. 


^i,r ) 


o(s^ 


where Mq (•), M2 (•) are functions depending only on the kernel K, as in the proof 
of Lemma B.6. For simplicity of notation, let us write them as Mo,M2 for short. 
Now we square both sides of the equality above 


iE2F,f=S' 


d-l . 


(Mg2 

fp ' 

1 

po TT 


+ SM 0 M 2 


fp 


po t: 

and apply Lemma B.8 to get 


Ei(E2F;)" = e5<5^-i<{m' 
+ <^"*22 [fp] {Xi,r) ( 




fp 

po TT 




fp 


po TT 




\ lllo 

\fpf 

ix^,r) + e'^ (a^ 

'{fpf 

(xi^r) + El (^1) 

\fpf 

1 

po TT 

2 \ 

po TT 


po TT 



fp ' 

V 

po TT 


(Xi^r) + E 2 


fp 

po TT 


+ O (e^ + (5^) l^. 


where mg, m^i, 1x122 are positive constants determined by the kernel function K 
and dimension d: 


JBtiO) 

4i= [ M,{r^){s^fdsP--ds^ 

42= f Mo{r^) M2{r^)dsP--ds‘^. 

JbHo) 


m,i = 


mno = 


Setting / = 1 , we obtain a similar expansion for the variance of E2G'j- 


El (E2G,r = m'g 


Similarly, 


I , 

r 1 

i Wg 


1 

po TT 


{Xi^r) + ' 


f^s 

p 

V 

po TT 


{Xi^r) d~ E 2 


2 

P 

po TT 


( A^ 

p2 

1 

po TT 


(Xi^r) + El (^ 1 ) 


P 

po TT 


(Xi, 


(xi,r)'j + O (e^ + (5^) l^. 


El [(E2F,)(E2G,)] = e5<5'^-^ m'g 


I , 

r fp^ 1 

|mo 


po TT 


+ e- 


mn 


21 I Af 


h>^ 

po TT 


,m: 


(Xi^r) + El 

fp 


(Xi^r) 

fP^ 


P O TT 


po TT 

{x^,r) + [fp] {Xr,r) A5 


P 


po TT 


(^i,r ) 
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+ 2i?2 


JP 

p o ttJ 


(a^i.r) ^ + O (e^ + (5^) !■. 


It remains to plug all these moment expansions back into EiX|. Clearly, we are 
only interested in scenarios in which /3 is sufficiently small, say /3 = O (e^ + 6 ^), thus 
the O (/3) and O (/3^) terms in EiX? can be thrown into O ^ 

Direct computation yields 


EiX^ = 


El (EaF,)"] (E1E2G)" + [Ei (E2G,)"] (EiE 2 F )2 
- 2 Ei [(E2i^i) (E2G1)] (EiE2i^) (E1E2G) + 0 {e^+ S^) 

) + /2( r^l ( ) 

1^2 \ po 7T [pOTT 


-2f{x,,r)^S 


fP^ 


po -K 
2 / 


+ O (e^ + ^2) I 


K.)||vf/f K,) 


p o ttJ 


= g¥^2(d (ai^_^)||vf/||^(a;i,^) + 0(e^ + (5^) 


Note that at (*) we used (denote p = p^/ (p o tt) for short) 

Af (f p) + fA§g - 2/Af (/p) 

= (Af/2) p + /2Af p + 2 (Vf/^ Vf p) + /^Af p 
- 2/2Afp - 2/pAf / - 4/ (Vf/, Vf p) 

= [Af /2 - 2/Af /] p + 2 (Vf /2 - 2/Vf /, Vf p) 

= 2 ||vf/f p. 


Therefore, we can bound EiA 2 uniformly in j as 


EiA| < e¥^2G-i) (C'e + O (e^ + ^2)) 


where 

TOgm'aiplr ||Vf/||^ 

^ — 4 

Wd-lPm 

is a positive constant. Interestingly, O (S) terms do not show up in this bound. 
In hindsight, this makes sense because Xj = E 2 Yi,s is already the expectation 
along the fibre direction, which intuitively “froze” the variability controlled by the 
fibrewise bandwidth S. 

It remains to bound 

E«/y/ - 
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for each 1 < j < Nb- For /3 = O (e^ + (5^), a bound for \Yj^s\ can be found as 
|Fj,s| = |-Fj',sEiE2(j' — Gj^sEiE 2 F'+ ^ (E1E2G — E1E2GI 
= O \\K\\^ ll/ll J + O \\K\\^ ||/|U) 

+ /3 (O \\K\\Jj + O |iiF||^)) 


where 


C = C(ll^lloo.ll/lloo.P->m) 

is a positive constant independent of j. Again taking advantage of /3 = O (e^ + 6 ^) , 
we have 

E 2 r/ = E 2 f;^, (E1E2G)" + E2G2 ^ (EiE 2 F )2 - 2E2 (a,-,g,-,) (E1E2F) (E1E2G) 

+ O [e'^< 52 (d-i) (^2 ^ ^ 

(E 2 y,)^ = [(E 2 f;-) E1E2G - (E2G,) E1E2A + p (E1E2G - E2G,) E1E2G]" 

= (E2A,)" (E1E2G)" + (E2G,)" (E1E2F)" 

- 2 (E2F,) (E1E2G) (E2G,) (E1E2A) + O ^ ^ 2 ^j ^ 

thus 

Ear/ - (Ear^)" = [EaF/, - (EaA,)"] (E1E2G)" + [EaG/, - (EaG,)"] (E1E2F)" 

+ 2 [(EaF,) (EaG,) - Ea (F,-,G,-,)]) (EiEaF) (EiEaG) 

+ O [e‘'52(d-i) (^2 ^ ^2/ ^ 

Observe that 

E2F/, = o(( 5^) , EaG/, = o(^^), Ea [F,-,G,-,] = O (^^) , 

while 

(EaF,- «)'= O , (EaG,, ,)'= O (^'"- 1 ) , (E^Fj) {E^G,) = O ( 6 ^-^) , 
thus the leading order error in Eal/^ — (EaY, )^ are determined by 

(EaF/J (EiEaG)' + (EaG/J (EiEaF)^ - 2E2 (F,-,G,-«) (EiEaF) (EiEaG). 

Note that by Lemma B.8 


EoA? = / K 


- J,s 


, d-l 




= S-T- ^ Mo ( McJiL I f (P;, 


P 


pO TT 




LoJlAiii 




Pp 

pO 7T 


{Pij,ii^i,r) + Fa • 




Pp 


pO TT 


{Pij,U^i,r) 
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where Mq, M 2 are constants depending on but uniformly bounded over M: 


Mo 


M, 


lie* - 0 


lie. - 


-] = [ K^( 

J Jb‘^-\o) \ 




< imiYoi ^-'( 0 )), 


3^2 J^2 i lie* ^^2 I 




{9^ 


< \\kCyo\{B f-Ao)) 


For simplicity of notation, we shall denote these constants merely as Mq, M2, 

?2 


dropping the dependency on The expansion for E 2 ^ys is thus 


1 Mo 

’ Pp ' 

1 

po TT 


+ 2-^2 




pp 

po TT 




+ E 2 


Pp 

p O TV 


Similarly, 


sPPImo 

P 


+ 7:M2 


p 

1 

po TT 


2 


po TT 


+ E 2 


3,s'-’3,‘ 


( Mo 

fp ' 

{P^j , 

+ xM2 


fp ' 

1 

po TT 


2 


po TT 


+ E 2 


+Op ) 


+ 0(5^)}, 


P 


po 7T 




fP 


p O TV 


(Pe., 


j 


A direct computation yields 

{E 2 EIA (E 1 E 2 G)" + (EaG^,) (E 1 E 2 F)" - 2 E 2 (F,-,G,-,) (E 1 E 2 F) (E 1 E 2 G) 


,3(d-l) 


= e 5 ^ i ruoMoP (xiA 


P 


po TT 


{P^J,^^^i,r) [f {Pi,,i^^i,r) - f {x^,r)y 


+ emom2iMop [xiA 


P 


{P^j.U^i.r) 


po TT 

- i^s [fp] P^,r) - f {XiA ^sP PiP) [f {Pi,j,iiXiP - f {x^P] 

+ SmlM2P^ {Xi r) 

[po TT 

+ 5mom22Mop {xtP 


{AgppiP + El p)p{x^P) [f {P^i,iiXiP - f{x^P]' 




p 

po TT 




i^SPPix) + E2 ■ PpiP) [/(P^.^^.XiP -fiXiAY 


- i^S [fp] Pi,r) - f iXi,r) Agp{x^A) [f {Pij,i,Xi,r) - f {Xi A] + O {e^ + 6^ j. 

Recall from Lemma B.2 that the difference between and Xi^r along the 

fibre direction is O {^d\j Pj,A)) = G(e). Therefore, the distance (under the Sasaki 
metric) between P^-^^.Xi^r and Xi^r in UTM is bounded by the square root of the 
square sum of (Im iCjPi) nnd the difference between P^.^^.Xi^r and Xi^j. along the 
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fibre direction, which is of order O As a result, all terms involving 

[/ - f{Xi,r)] 

are of order O ^ Thus 

(E1E2G)" + (E2G2 J (EiE 2 F )2 - 2E2 (F,-,G,-«) (E1E2F) (E1E2G) 
< (G'e + G"(5), G' > 0, G" > 0. 


As a result, 

E^^r/ - (E^^l^y = O (e + 5)) . 

We are now ready for applying Lemma B.12 to L} Since 

IE1E2GI =o(^eW^'j , 

we have constants such that 

C''eU^ < IE1E2GI < C”eU^. 

For any 9 G (0,1) to be fixed later. 


piNB,NF,l3)=. 


1 


NbNf 





(EiE2G)^ 


E^W/ - 


(e«w,)' 


+ ^ • 2CeU^ {1-0)13 (EiE2G)^ 

O 


^9^NbP^ (EiE2G)^ 


El A/ + -Ce^S‘^-^d (3 (E1E2G) 


< Nb ■ exp { - 


exp < - 


^ (1 - 9)^ NfP^ {C'{)* g2dj2(d-l) 


Ce‘^s"-^ (e + <5) + (1 - 6») /3 • (G^')^ 

O 


^9^NbI3^ (C”)* 


(G'e + O (e^ + 5^)) e¥ 52 (d-i) ‘^CeU^-^9p ■ (G")^ 

o 


Again by restricting ourselves to /3 = O (e^ + , this bound can be rewritten as 

(B.34) 


Gi (e + < 5 ) + O (^ei (e 2 + <52)) 


> +exp < - 




G2e + 0(e2 + 52) 


As pointed out in Remark B.13, the second term in this bound is the sampling 
error on the base manifold; the noise error resulted from this term is of the order 


O 


{NbC^ " = 0(^5 "62 4^, 
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which is in accordance with the convergence rate obtained in [82] . The first term in 
the bound reflects the accumulated fibrewise sampling error and grows linearly with 
respect to the number of fibres sampled, but can be reduced as one increases Np 
accordingly (which has an effect of reducing fibrewise sampling errors). The choice 
of 0 is important: as 9 increases from 0 to 1, the first term in the bound decreases 
but the second term increases. One may wish to pick an “optimal” 9 G (0,1), but 
this does not make sense unless one chooses e,S,NF appropriately so as to make 
the sum of the two terms smaller than 1. Let us consider 0* G (0,1) satisfying 

(B.35) (1 - 9^f = 9lNBei , 


or equivalently 


(B.36) 



Setting 0 = 0* in (B.34), we have 


(B.37) 


p {Nb, Nf, /3) < {Nb + 1) exp 


9lNBel(3^\ 
Cie + S) J 


= exp - 


9lNBe^P^ 
C{€ + S) 


log {Nb + 1) 


where C is some positive constant. Since 

Nb 

lim ,-= oo, 

Af_B—foo log Nb 

for any fixed e, 6 we have p {Nb,Nf, /?)—>■ 0 as Nb —t oo, as long as one increases 
Nf accordingly so as to prevent 0* from approaching 0 or 1; for instance, this can 
be achieved by requiring 

Nf 

(B.38) ^lim_^ — =pe (0,oo). 


Under this condition, we have the pointwise convergence in probability of gf. 
We now turn to the general case a ^ 0. Recall that 


H^,sf ix^,r) 


Nb Nf 

H Xj^s) f {Xj,s) 

3 = 1 B=1 _ 

Nb Nf 

i^i,r,Xj,s) 

3 = 1 s=l 


Ne,S {Xi^r, Xj^s) / {Xj,s) 
{Xi,nXj^s) 




Nb Nf 

P (^3,s) — ^ ^ ^ ^ § {Xj^sj . 

k=l t=l 


where 
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As Nb —>■ oo, Np —>■ oo, by the law of large numbers, 


1 1 f ~ 

Nb—^oo i\q Nf—^oo i\p JuTM 

= p {Xi^r) = E1E2 \k^^s {Xj^s, •) 


Therefore, as Nb —t 00 , Np —>• 00 , we expect (xi^r) to converge to 


/ K“s(.x^,r,v) f {v)piv)dQiy,w) 

=Ksfi^^x) 


lUTM 


K^,S ixi,r,v)p{v) dO {y,w) 


— / “f 


.W 21 

2mo 


[fp^ “]{Xi,r) , X Afpl °{Xi^r) 

(Xip) ^ (Xpr) 


+ 6 


W 22 

2mo 


[fP^ °‘]iXi,r) X ""(Xip) 

-J[X,p) 


P^ “ (Xip) 


P^ “ (Xip) 


+ 0{e^ + 6^), 


which gives the same bias error O + S^) as for the a = 0 case. 

Now it remains to estimate the variance error. Since our notation differs from 
the standard kernel density estimator by a factor , we shall compensate for 

it in the following computation. 


H^sf (xip) = 


.^e,<5 (a^i.rj Xjp) f (Xjp) 


Nb Nf 


Ks (a 


s) 


iXi,r)pZ5i^i,^) 


Nb Nf 

EE ^€,S f 

j^l 5^1 _ 

Nb Nf 

EE ^e,6 (^j,s) 

i-1 s-i 


and 


Nb Nf 

EE Nfz^S {Xip^ Xjp') p^^^ (Xjp) f (Xjp) 

j=l s=l _ 

Nb Np 

EE -^e ,(5 (^z,r 5 ) Pg ,5 {^j,s) 

j^l s^l 


Nb Nf 

EE {xipj Xjp') p^^g {xjp^ f (xjp) 

j=i ^=1 _ 

Nb Nf 

EE ^e,S Xj^s'} P^^S (^j,s) 

j=ls^l 


Nb Nf 

E! E! ^e,<S {xjp, Xjp) NpNpP^ g (xjp) — p^ g (xj,s) f (xjp) 

j=l s=l _ 

" Nb Nf 

EE E.5 {x^p, Xjp) NgNpp^ g (xjp) 

j = l S = 1 


Nb Nf 

+EE ^e,S {^i,rT^j,s)p^ § {Xj,s) f (^j,s) ^ 

i-1 
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Nb Nf 


j=l s=l 


-EE 


Nb Nf \ / 

E E ^7-®) ^B^fKs i^j,s) EE 

^i=ls=l ’ / \j=l^=l 

=: (^) + (B), 

thus if we can estimate (A), (B) by controlling the error 

(xj-,.) - p-g (xj-,) 

then it suffices to estimate the variance error caused by 


(B.39) 


^£,<5 (Xi^r, Xj^s) f (Xj,s) 

h'lhl Ks^^Br)Pls{^],s) 

Nb Nf £> / \ 

ix^,r) Pe,s l^J,s) 


Our previous proof for the special case a = 0 can then be applied to (B.39): the only 
adjustment is to replace the kernel (x, y) in that proof with the a-normalized 
kernel 

Ke,s {x,y) 

Pis i^)plsiy)' 

We would like to estimate the tail probability 

1 


Pe,5 {Xj,s) - Pe,S {Xj,s) > | , 


Nb^f 

but since (^j,s) = O (^eiS ^~^ , it is not lower bounded away from 0 as e, ^ 0. 

For this reason, and noting that (^) and (B) are invariant if we replace Pe,6^ Pe,6 

. d ^ d-1 ^ d ~ d-1 

Withe 2 ^ 2 pe ( 5 , e 2 0 2 we estimate instead 

q{NB,NF,fi) := P I e~'i5~Np^ g - €~i5~Np^ g (^xj^s) > 


' { NbNf ^"’^ ~ > e 2 5 ^ 


NB N F 


where 

Pe^S {Xjs') 'y ( y ) He ,5 (.Xj,s ^X]^f) , 
fe=l t=l 

Pe ,5 (Xj^s) — IE1IE2 ^e,S (Xj^s; *) 

We would like to apply Lemma B.12 again. For this purpose, first note that 


He,5 {,Xi,rt Xj s) ^ \\^\ 


E 1 E 2 


Eo 


He ,5 {Xi,r; ‘) 


^e,5 {Xix ; ‘) 


< C5^ 

d , d-1 


< Ce^5 2 , 


where 


C = C{\\K\\^ ,PM,Pm,d) 






























HYPOELLIPTIC DIFFUSION MAPS I: TANGENT BUNDLES 


71 


is some positive constant. Moreover, direct computation yields 




^e,5 *) 


= o 




El 




1 2 


= o 


Therefore, by Lemma B.12, for /? = O (e^ + 5^), 

( { 1 -ef NFe‘^S‘^-^/3'^ 


q{NB,NF,l3) < iVsexp < - - 


= Nb exp < - 


2CiS^ 

(1 - ef NFe^S^ 13^ 
2C, 


■ exp 


exp< -- 


9^NBe‘^S^-^P‘^ 

2CieUd-^ 

0^NBei/3^\ 

2Ci [ 


where Ci > 0 is some constant. A simple union bound gives 

1 


u 


NbNf 


Pe^S (^j,s} Pe,5 


> £2,5 2 /3 


< NbNf 


Nb exp < - 


{l-ef NFe'^S^P'^ 
2C^ 


exp ■ 


6i2iVBe2/32 

2CS 


If we let 9 = 0*, where 0* is defined in (B.35), 


9 9'^Nb 

{l-9^fNF = ^-iF. 

e^o 2 


and hence 


(B.40) 


u 


1 


NbNf 


Pe,S Pe^d 


d, - d -1 

> e^S 2 f 


< Nb {Nb + 1) Nf exp < - 


OlNBeip^ 

2Ci 


We are interested in seeing how this bound compares with the bound in (B.37). As 
Nb,Nf —>■ 00 , as long as (B.38) holds, 


Nb {Nb + 1) Nf exp < - 


OINbc-^P^ 

2Ci 


{Nb + 1) exp ■ 


OlNBe-^P^ 


=NbNf exp —9iNBf-^ /3 


C{e + 5) 
1 


L2C. C (. + «)]' 

thus the bound in (B.37) is asymptotically negligible compared to the bound in 

(B.40). This means that when a ^ 0 the density estimation in general slows down 

1 

the convergence rate by a factor (e + 5) ^, which is consistent with the conclusion 
for standard diffusion maps on manifolds [81, 45], since 77“^ is essentially the heat 
kernel of the diffusion process (instead of the graph hypoelliptic Laplacian itself). 
As a consequence of this observation, we know that for probability at least 

elNBe^P^] 


1 - Nb {Nb + 1) Nf exp < - - 


2C, 
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we have 


Ke,S (xi^rj Xj^s) / (Xj^s) 

Ks(^hr)K,si^j,s) 

ix,,r,Xj,s)p-s 

Ksi^hr)Pe,si^j,s) 


- H^,sf {x^k < P 


as well as 


P€,s ixj,s) - Pe,s {xjk <e^ 5 ^l 3 for all 1 < j < Nb, 1 < s < Np- 


AT AT y €,0 \^J,S J t'e.O ^ ^ M ^ J ^ X, ^ ^ ^ 

I iV Qi\p I 

Note that by our assumption 

0 < Pm < P {x, v) < Pm < oo for all {x, v) € UTM 
there exists constants Ci , C 2 such that 

0 < Cl < e “2 g {xj,s) < C 2 < 00 . 

For sufficiently small /3, these bounds also apply to Ng^Npp^^g {xj,s) with high 
probability: 

0 < Cl < ^ e~'^S~‘k^Pe,s {xjk < C 2 < 00 . 


Nb^f 


More specifically, we have 


0 < ^ < ^bNfpJ {xjk < < 00 , 

6262^ 2 Cl€2() 2 

1 -1 , ^ 1 
0 ^ < PpS i^3,s) < ^ 

0262^ 2 CiC^O 2 


NbNfpJ {Xj^s)-pj {xps) < = ^ 2^3 


The errors (A), (B) can thus be bounded as 


|(Al)|<C“e^5^ 


2 P _ 2“-iaC2||/||^ 

C2ei<5^; C^i<5^ Cl 

( 2 /? ^ 2“-i«C,"+^ ll/ll^ 

I /-y i. c^ni / /^9 ^ ^ d-l pyOc-\-2 ^ 


f 2 p _ 2“-i«CrMl/lloo . 

ad ^c(d-l) lu Iloo ^ d ^d-1 I d ^d-1 ^a+2 P 

Cft^5 2 \C2e2(5 2 J C(e'iO 2 

Since Ci, C2 only depends on the kernel function K, the dimension d, and Pm,PM, 
these bounds ensures that 


HAfix.k-Ksfi^^x) <CP 


with probability at least 


1 - Nb {Nb + 1) Np exp < - 


OlNpe^P^ 


where constants C, Ci only depends on the kernel function K, the dimension d, and 
Pm,P m- This establishes the conclusion for all a € [0,1]. □ 
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B.2.2. Sampling from Empirical Tangent Spaces. The following two lemmas from 
[81] provide estimates for the approximation error in the local PCA step. We 
adapted these lemmas to our notation; note that the statements are more compact 
than their original form since we assume M is closed. 

Lemma B.14. Suppose Kpqa & !])• If epcA = O b, then, with 

high probability, the columns of the D x d matrix Oi determined by local PCA form 
an orthonormal basis to a d-dimensional subspace o/K^ that deviates from 

by O ^^6 following sense: 

(B.41) ||070, - 0 ||hs = O (cIca) = O , 

where 0^ is a D x d matrix whose columns form an orthonormal basis to L:tTx,M. 
Let the minimizer if (B.41) be 

(B.42) dj = argmin||O70i - 0 ||f, 

OGO(d) 

and denote by Qi the D x d matrix 

(B.43) Q, := 0,07, 

The columns of Qi form an orthonormal basis to and 

(B.44) ||Oi — QjIIf = O (cpca) , 

where ll-Hp is the matrix Frobenius norm. 

Proof. See [81, Lemma B.l]. □ 


Lemma B.15. Consider points Xi,Xj € M such that the geodesic distance between 

them is O epcA = O <^+ 2 ^ ^ probability, Oij approximates 

Pxi,x in the following sense: 

(BT5) 

= {{i.Px,,x,X {Xj),ui (t,)));7^+0 (eIcA + e®) > V all X eV {M,TM ), 
where {ui {xi)}f^^ is an orthonormal set determined by local PCA, and 
Xi = {{l^X {xi),ui G 

Proof. See [81, Theorem B.2]. □ 

Proof of Theorem 4.14. By Definition 4.13 (2), 


_OjxBjT^ 

— II -p II • 

/ i.rW 


By Lemma B.15, 


OjiBi Ti^r — Bj + O ^C^CA + J 


OjiBjn^r _ Pj 


\Bi Ti^r\ 


IB- TiJ 


+ 0 


(epCA + e®) 


thus 
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where we used 


T 2 f -p, 1 — i % T p ^ % T p 7” j 7" 7 i T 


I ll“* ' IIF 11^* IIFI 

< 11^7^ - <37||f = ^ (^pga) 


-®7 II < — 1- 


Therefore, 


^ _ OjiBjri^r '^i.s 

lls^T^JI ||b7t,J 


Bj {Pi,,iiri,r) 


+ (^PCA + £") 


^7 '^j,s 

Pta 


“ “ ^ 7 (*^PCA + e^) 

= - fj^s + o (^epcA + e^) 5 


iCi,r Cj^sll Tj^s < 4 {OjiCi^r Cj^g) {P^j ,ii'''i,r ^3,8 


— O ( Cpp, A + e2 I . 


Thus a Taylor expansion for K at point 


116||0,4Q,,-c,-,f 


116-6f l|Oj4C7..-CA.f 


116 - 6I7 “ '^3,8 + O (^epcA + e" 

^ ^ 


116 “611 Pij,ii'''i,r ^3,8 


Pd^K 


116 611 Pij,ii'''i,'r '^3,8 


For any function g G C°° (UTM), this leads to 
[ J(fe^5{T^^r,V) 9{v)dP>{v) 


Ke,s iTi^r,v)9iv) dO iv) 


ll^i y\\ 


g (y, w) day (w) dvoW (y) 
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[ K^^siTi,r,v)9iv)dQiv) + (^^S''^^ ^o(e|cA + e")- 
JUTM ^ ' 


Following the notation used in the proof of Theorem 4.12, by the law of large 
numbers 


lim lim 


1 


Nb—^oo Nf^oo 


qe,S iTi,r) = E1E2 [-^.5 (Ti,r, ■)] 


= E 1 E 2 




+ e2(5'^2^ ^ep 


PGA 


3 ' 

+ 62 


and hence we expect (Ti^r) to converge to 

e,sfir,,r) + o[s-^[e 


H. 


■^PCA 


+ e = 


= /(T»,r) + e 


TO 21 

2mo 


[fp^ “] {T^,T) _ , . Afpi 

/ V%,r) 


“ (n.r) 


P^ “ (n.r') 


+ 5 


W 22 

2TOn 


(r.,.) ^ ^ pi-“ (r,,,) 


+ O if' + +0 {5 ^ (e^cA + e"")) ■ 

In fact, noting that 


d ^ d — 1__ __ 

£2(5 2 NbNp 


9e,S {Ti,r) 


1 


£2(5 2 NbNf 
1 


Nf Nb 

EE j,s') 


elS^NBNp 


-^6,5 ('^'i,r 5'^_7,s) H” 


O (<5 ^ (epcA + e^)) 


d ^ d-l 
£2^2 


1 


cU^NbNf 


Pe,S j,s') H“ 


^ (e?CA + e®)) 


d ^ d-l 

£2 0 2 


we have 
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Consequently, is approximately H^sf 

Nb Nf 

H 

ir.^r) = -- 


j=l s=l 


1 


Nb Nf 


NqNp 


EE gad^a(d-l)^2a^2a^a ,,,Tj-^) / (t^-^) 


i=i s=i 


1 


iVjr 


NbNf ^ 

J = 1 S = 1 


EE gadja(d-l)^2a^2a^a 


1 

iVsiVp’ 


Nb Nb 

E E (r,,,,T,-,) / (r,,,) + O (r^ (e|cA + e^)) 

i=i s=i 


1 


ATb Nf 


NbNf ^ 

j = l s=l 
^ Nb Nb 


E E ,) + O (e|cA + e®)) 


iVsiVi^ 


E E (r.,.,T,-.) / (r,,.) 


j = l S = 1 


1 


A^b Ab 


O 


EE gadja(d-l)^2a^2a^a^ Tj,^) 


(r .(4 


PCA 


NbNf . 

J=1 S=1 

Nb Nf 

EE^M ir^,rN,s) f {Tj, 

j-1 S^l_ 

' Nb' Nb 

EEE“d 

i=i s=i 


+ ^ (^PCA + *^")) 


= ff.v (^^-)+o (<5-'(4 ca+ e®)) ■ 

Therefore, under the assumption that 

(5 ^^pcA T 6^^ —^ 0 as c —^ 0, 
we can apply Theorem 4.12. This completes the whole proof. 


□ 
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